hoffmangroup / umap

24 stars 2 forks source link

Issues of using umap (1. where to download the latest version; 2. potential bug of unify_bowtie.py) #12

Open kyhkkm opened 1 year ago

kyhkkm commented 1 year ago

Hello,Our team is working on a project about the genome Blacklist of different species, which uses umap software to generate a mappability containing umap mappability files. I have encountered these two problems in using umap.

  1. Your latest version is 1.2.0, but I can only download version 1.1.1 in conda or github, how can I download the latest version?
  2. We noticed that after running 'unify_bowtie', the kmers result of the first chromosome is always lost, for example, the bowtie file will be from chr2 to chrY, and all chr1 result will be lost. My current solution is to add chrMT as chr0 (before chr1), so that chr0 will be lost. And we could get the result from chr1 to chrY. We are not sure if this soultion is appropriate or not? And could you provide guidance on this issue?

Below is the corresponding command:

############################################################################################################################################################################################################## working_dir=/public/home/mkong/Blacklist/03.work/02.mappability bowtie_bin=/public/home/mkong/anaconda3/envs/Blacklist/bin bowtie_index_dir=/public/home/mkong/Blacklist/03.work/02.mappability/genome umap_path=/public/home/mkong/Blacklist/00.soft/umap/umap

we only consider kmer=50

for i in 50 do for j in seq 0 2442 do python ${umap_path}/run_bowtie.py -var_id SGE_TASK_ID -job_id ${j} ${working_dir}/kmers/k${i} ${bowtie_bin} ${bowtie_index_dir} genome.fa done done

for i in 50 do for j in seq 0 20 do python ${umap_path}/unify_bowtie.py ${working_dir}/kmers/k${i} ${working_dir}/chrsize.tsv -var_id SGE_TASK_ID -job_id ${j} done done ##############################################################################################################################################################################################################

mehrankr commented 1 year ago

Dear Mei,

Regarding version, the v.1.2.0 tag was on bitbucket. Currently you can use the main branch on GitHub: https://github.com/hoffmangroup/umap

This is the latest version and you will be using the correct scripts.

About the error you are describing: This seems to me like an off-by-one error caused by some task managers using 1-base and some using 0-based indexing. The example script I have provided doesn't have that issue.

You can investigate the contents of the chrsize_index.tsv file to see why this is happening. Adjusting that file might be easier than adding a fake chromosome.

Best, Mehran