refresh-bio / agc

Assembled Genomes Compressor
MIT License
152 stars 13 forks source link

genome name in fasta idline #9

Closed lynnjo closed 6 months ago

lynnjo commented 10 months ago

hi all. Is there a way to have the genome name included in the id line when AGC outputs a fasta file?

For example: I make a query to get chr1 from different genomes. This query might look like:

agc getctg assemblies.agc chr1@LineA chr1@LineB chr1@LineC > fasta.out

AGC's output shows id lines of ">chr1" for all 3 of these, which makes it difficult to distinguish which sequence belongs to which genome. We are hoping to use AGC for our research project, and this is a scenario that will frequently be encountered.

Ahy suggestions?

lh3 commented 9 months ago

Agc keeps FASTA comments. I would recommend to encode sample/species information there such that you can identify the source later.

lynnjo commented 9 months ago

Thank you - we'll try updating our fasta files and the code that parses AGC output.