guillaumecharbonnier / mw-lib

Metaworkflow library of generalised Snakemake rules
MIT License
0 stars 3 forks source link

Replacing all chrominfo files with fa.fai #28

Open guillaumecharbonnier opened 1 year ago

guillaumecharbonnier commented 1 year ago

I have noticed that fa.fai files could be used as chromInfo files. Since we have currently some discrepancies between aligner indexes and chrominfo for alt chromosomes (see the tail stdout below), I am thinking about changing all chrominfo in src/tables/chrominfo_ids.tsv to their respective fa.fai files, either the one available in igenome archives, or by computing it in the workflow from fa. @SebastienNin Do you see any issues if I do that? And can I update it directly to the master branch?

chrominfo-GRCh38· ["out/gunzip/to-stdout/rsync/ucsc/goldenPath/hg38/database/chromInfo.txt"]

chrominfo-GRCh38· ["out/tar/xvzf_igenome/Homo_sapiens/NCBI/GRCh38/Sequence/WholeGenomeFasta/genome.fa.fai"]

$ tail out/gunzip/to-stdout/rsync/ucsc/goldenPath/hg38/database/chromInfo.txt out/tar/xvzf_igenome/Homo_sapiens/NCBI/GRCh38/Sequence/WholeGenomeFasta/genome.fa.fai                         
==> out/gunzip/to-stdout/rsync/ucsc/goldenPath/hg38/database/chromInfo.txt <==
chrUn_KI270379v1        1045    /gbdb/hg38/hg38.2bit
chrUn_KI270329v1        1040    /gbdb/hg38/hg38.2bit
chrUn_KI270419v1        1029    /gbdb/hg38/hg38.2bit
chrUn_KI270336v1        1026    /gbdb/hg38/hg38.2bit
chrUn_KI270312v1        998     /gbdb/hg38/hg38.2bit
chrUn_KI270539v1        993     /gbdb/hg38/hg38.2bit
chrUn_KI270385v1        990     /gbdb/hg38/hg38.2bit
chrUn_KI270423v1        981     /gbdb/hg38/hg38.2bit
chrUn_KI270392v1        971     /gbdb/hg38/hg38.2bit
chrUn_KI270394v1        970     /gbdb/hg38/hg38.2bit

==> out/tar/xvzf_igenome/Homo_sapiens/NCBI/GRCh38/Sequence/WholeGenomeFasta/genome.fa.fai <==
chrUn_KI270753v1        62944   3143089101      70      71
chrUn_KI270754v1        40191   3143153063      70      71
chrUn_KI270755v1        36723   3143193947      70      71
chrUn_KI270756v1        79590   3143231313      70      71
chrUn_KI270757v1        71251   3143312158      70      71
chrUn_GL000214v1        137718  3143384546      70      71
chrUn_KI270742v1        186739  3143524351      70      71
chrUn_GL000216v2        176608  3143713877      70      71
chrUn_GL000218v1        161147  3143893127      70      71
chrEBV  171823  3144056708      70      71
$ head out/gunzip/to-stdout/rsync/ucsc/goldenPath/hg38/database/chromInfo.txt out/tar/xvzf_igenome/Homo_sapiens/NCBI/GRCh38/Sequence/WholeGenomeFasta/genome.fa.fai                         
==> out/gunzip/to-stdout/rsync/ucsc/goldenPath/hg38/database/chromInfo.txt <==                
chr1    248956422       /gbdb/hg38/hg38.2bit   
chr2    242193529       /gbdb/hg38/hg38.2bit                                                  
chr3    198295559       /gbdb/hg38/hg38.2bit                                                                                                                                                 
chr4    190214555       /gbdb/hg38/hg38.2bit                                                  
chr5    181538259       /gbdb/hg38/hg38.2bit   
chr6    170805979       /gbdb/hg38/hg38.2bit                                                  
chr7    159345973       /gbdb/hg38/hg38.2bit                                                                                                                                                 
chrX    156040895       /gbdb/hg38/hg38.2bit                                                  
chr8    145138636       /gbdb/hg38/hg38.2bit   
chr9    138394717       /gbdb/hg38/hg38.2bit                                                  

==> out/tar/xvzf_igenome/Homo_sapiens/NCBI/GRCh38/Sequence/WholeGenomeFasta/genome.fa.fai <== 
chr1    248956422       112     70      71                                                    
chr2    242193529       252513167       70      71                                            
chr3    198295559       498166716       70      71                                                                                                                                           
chr4    190214555       699295181       70      71                                            
chr5    181538259       892227221       70      71                                                                                                                                           
chr6    170805979       1076358996      70      71                                            
chr7    159345973       1249605173      70      71                                                                                                                                           
chr8    145138636       1411227630      70      71                                                                                                                                           
chr9    138394717       1558439788      70      71                                                                                                                                           
chr10   133797422       1698811686      70      71 
SebastienNin commented 1 year ago

Hi @guillaumecharbonnier, I don't see any problem with that. You can merge it