tolkit / fasta_windows

A Rust implementation of sliding window stats over fastas.
MIT License
13 stars 2 forks source link

Change ID for Released Genomes #6

Closed charlottewright closed 3 years ago

charlottewright commented 3 years ago

When running fasta_windows on released genomes, it grabs the contig ID to form the ID in the output ID column, it would be more useful to use the chromosome/scaffold ID E.g. Chr W rather than CAJHVJ010000001.1.

Here is my manual workaround that I am using currently: https://github.com/tolkit/lepidoptera-pipelines/blob/main/code/Replace_contig_with_chr_names

charlottewright commented 3 years ago

E.g: From this

head data_original_chr_assignments/spp_files/Inachis_io.fasta_windows.tsv_Avg_GC_Kmer.csv 
,Chrom_ID,Length,Average_GC,Avg_Kmer,Spp,Phylo
0,CAJHUF010000001.1,60000,0.4114059949999999,2.6814999999999998,Inachis_io.fasta_windows.tsv,Inachis_io.fasta_windows.tsv
1,CAJHUF010000002.1,50000,0.42049072800000004,2.8046,Inachis_io.fasta_windows.tsv,Inachis_io.fasta_windows.tsv
2,CAJHUF010000003.1,40000,0.388761405,2.20275,Inachis_io.fasta_windows.tsv,Inachis_io.fasta_windows.tsv
3,CAJHUF010000004.1,40000,0.374201575,2.583,Inachis_io.fasta_windows.tsv,Inachis_io.fasta_windows.tsv
4,CAJHUF010000005.1,40000,0.425641625,2.5685,Inachis_io.fasta_windows.tsv,Inachis_io.fasta_windows.tsv
5,CAJHUF010000006.1,30000,0.3933629166666666,2.482,Inachis_io.fasta_windows.tsv,Inachis_io.fasta_windows.tsv
6,CAJHUF010000007.1,30000,0.4030980533333333,2.6076666666666664,Inachis_io.fasta_windows.tsv,Inachis_io.fasta_windows.tsv
7,CAJHUF010000008.1,30000,0.3821424,2.779333333333333,Inachis_io.fasta_windows.tsv,Inachis_io.fasta_windows.tsv
8,CAJHUF010000009.1,30000,0.35684131,2.4803333333333333,Inachis_io.fasta_windows.tsv,Inachis_io.fasta_windows.tsv

To this:

head data_reassigned_chr/spp_files/Inachis_io_Avg_GC_Kmer.csv 
,Chrom_ID,Length,Average_GC,Avg_Kmer,Spp,Phylo
0,scaffold38ctg1,60000,0.4114059949999999,2.6814999999999998,Inachis_io,Inachis_io
1,scaffold39ctg1,50000,0.42049072800000004,2.8046,Inachis_io,Inachis_io
2,scaffold40ctg1,40000,0.388761405,2.20275,Inachis_io,Inachis_io
3,scaffold41ctg1,40000,0.374201575,2.583,Inachis_io,Inachis_io
4,scaffold42ctg1,40000,0.425641625,2.5685,Inachis_io,Inachis_io
5,scaffold43ctg1,30000,0.3933629166666666,2.482,Inachis_io,Inachis_io
6,scaffold44ctg1,30000,0.4030980533333333,2.6076666666666664,Inachis_io,Inachis_io
7,scaffold45ctg1,30000,0.3821424,2.779333333333333,Inachis_io,Inachis_io
8,scaffold47ctg1,30000,0.35684131,2.4803333333333333,Inachis_io,Inachis_io
Euphrasiologist commented 3 years ago

Okay, I'll add a flag to optionally print the fasta header description - to be implemented sometime this week!

Euphrasiologist commented 3 years ago

@charlottewright let me know if this is okay!