pritykinlab / guidescanpy

1 stars 0 forks source link

Alternative formats for chr2acc files #51

Closed vineetbansal closed 1 year ago

vineetbansal commented 1 year ago

There are 2 formats for files that give the mapping between "common names" for chromosomes and their accession numbers.

The chr2acc format that we're supporting now, e.g.: https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/146/045/GCF_000146045.2_R64/GCF_000146045.2_R64_assembly_structure/Primary_Assembly/assembled_chromosomes/chr2acc

#Chromosome Accession.version
I   NC_001133.9
II  NC_001134.8
III NC_001135.5
IV  NC_001136.10

and the *.chromAlias.txt format, e.g. https://hgdownload.soe.ucsc.edu/goldenPath/hs1/bigZips/hs1.chromAlias.txt

# ucsc  genbank refseq  assembly    ncbi
chr1    CP068277.2  NC_060925.1 1   1
chr10   CP068268.2  NC_060934.1 10  10
chr11   CP068267.2  NC_060935.1 11  11
chr12   CP068266.2  NC_060936.1 12  12

We'll need to modify our add-organism command to play well with the latter.