Open Miracle-Yao opened 1 month ago
Hello @Miracle-Yao
Please open each item as a separate issue going forward so we can get to them in order.
I will let @EricR86 get back to your about 1.
Regarding 2.
, I think I know what's happening.
Originally the script was developed for a version of SGE that only supported task IDs >= 1
Would you mind replacing all instances of
job_id = int(os.environ[args.var_id]) - 1
with job_id = int(os.environ[args.var_id]) - 1
And also any where else where job ID is being subtracted by 1?
If not resolved, please provide a link to the chromosome size file and fasta file so we can look into this.
Regarding 3.
; there is no straight answer since mappability of paired fragments varies by their length. I recommend considering the discussion in the manuscript and figure 7a to decide on the proper length. This could vary for each study as well.
@Miracle-Yao The tagged version is likely correct despite whatever version reported by pypi/python. I suspect the configured version in the code was simply not kept up to date. The tagged commit is simply what was effectively left as the last working state without much support for the time being.
Hi, @EricR86 @mehrankr
Thank you for your quickly reply & useful suggestions.
Hi, @EricR86 @mehrankr
Thanks for providing an excellent program for calculating genome mappability. I'd like to inquire about a few questions I've had while running umap here.
I'm using the latest version I just downloaded (
umap-1.2.1 tag
), It seems thatumap-1.1.1-py2.7.egg
is used during the installation and compilation process, so I'm not quite sure which version of umap I'm actually installing. I think clarity on the version was crucial because during the next implementation, I discovered some new issues.After I executed the
ubismap.py
script, I tried to get the file for the specified K (e.g. 50mer) using theget_kmers.py
script. The script I use is as follows:Although the job id is iterated from 0 to 2441, the generated file as of
chrY.2440.50.kmer.gz
does not producechrY.2441.50.kmer.gz
. The log file is as follows:I don't know why I get the previous
KeyError: 'SGE_TASK_ID'
error after running it. Also, the last record forchrY
that you know of in thechrsize_index.tsv
file looks like this2441 chrY 43000001 43547829
. In other words, theCreated all sequences for chrY:43000001-43547829
step is missing. How to avoid the aboveKeyError: 'SGE_TASK_ID'
and how to be able to generate thechrY.2441.50.kmer.gz
file properly?Most of the sequencing data in ENCODE is single-ended, with read lengths ranging from 24, 36, 50 to 100, so these lengths were used in your previous work to generate mappability tracks. With the widespread use of pair-ended sequencing, especially the PE150 model, it would be useful to integrate the data from the previous SE50, SE100 models and the data from the existing PE100,PE150 model, how to choose the appropriate kmer? 50, 100, 200 and 300?
Thank you for your time and consideration. I am looking forward to your reply.