hoffmangroup / umap

23 stars 0 forks source link

Some hg38 Bismap k100 tracks do not contain chr1 or chr2 #9

Open njspix opened 2 years ago

njspix commented 2 years ago

For instance, the bismap hg38 single-read k100 file from https://bismap.hoffmanlab.org/ : image

mehrankr commented 1 year ago

Hello,

Please use the tracks available at UCSC genome browser mappability track. Also, could you mention where you downloaded these incomplete tracks from?

Best, Mehran

njspix commented 1 year ago

Hello Mehran,

The incomplete track was downloaded from https://bismap.hoffmanlab.org/raw/hg38/k100.bismap.bed.gz. I tried to use the UCSC tracks, but was unable to find a way to access them programmatically. Ideally I would love to have a (semi)-permanent online resource to point my scripts to, which will always give the same version of the track.

mehrankr commented 1 year ago

Sorry about this problem. I looked into the folder and both bedgraph.gz and wg.gz files don't have this issue. Please use those files for now since they are complete and don't have that issue.

Those files are more complete than the BED version; The BED version basically thresholds those files with >0 to include/exclude genomic regions.

In addition, the .uint files https://bismap.hoffmanlab.org/raw/uint/hg38/ are the best for programmatic access You can see the code for using them here:

https://github.com/hoffmangroup/umap/blob/master/umap/uint8_to_bed_parallel.py#L280

njspix commented 1 year ago

Thank you for the information! I am using the multi-read bed files for now.