refresh-bio / agc

Assembled Genomes Compressor
MIT License
152 stars 13 forks source link

sample names include ".fa" when gzipped files are used #14

Closed lynnjo closed 5 months ago

lynnjo commented 6 months ago

Previously I loaded fastas to AGC which were not compressed files. When the file name was "CML103.fa" the sample name shown from "listset" was "CML103"

I am now loading gzipped files to AGC, and the names are "CML103.fa.gz". This results in the ".fa" being included in the sample name such that is is now "CML103.fa"

Is there a way around this? Do the gzipped fastas have to be named "CML103.gz" without the .fa? This is a little annoying as when compressing the files, ".gz" is appended to the original name. But if necessary, we can rename all of these files.

I ask as our software matches the sample name from AGC compressed file to sample names that appear in VCF files.

Will someone comment on naming conventions, and AGC handling of compressed files? Thanks.

sebastiandeorowicz commented 6 months ago

Thank you for pointing me the issue. We've fixed this. The forthcoming v.3.1 release will contain the fix.

lynnjo commented 6 months ago

Thank you !

sebastiandeorowicz commented 5 months ago

AGC 3.1 is ready. The fix is implemented.