virus-evolution / gofasta

MIT License
31 stars 1 forks source link

--reference sometimes accepts filename and sometimes chrID #45

Open mmokrejs opened 7 months ago

mmokrejs commented 7 months ago

Hi, I find it puzzling the --reference sometimes accepts ID of the reference record in the msa, I suggest the flag to be renamed. Other gofasta commands expect a filename after it so it is easy to misunderstand this.

$ ./gofasta variants -a 7-WU-FF1.gff -r 7-WU-FF1.fasta --msa 7-WU-FF.fa
Error: Couldn't find reference (7-WU-FF1.fasta) in msa
Usage:
  gofasta variants [flags]

Flags:
      --msa string          Multiple sequence alignment in fasta format (default "stdin")
  -r, --reference string    The ID of the reference record in the msa
  -a, --annotation string   Genbank or GFF3 format annotation file. Must have suffix .gb or .gff
  -o, --outfile string      Name of the file of variants to write (default "stdout")
      --start int           Only report variants after (and including) this position (default -1)
      --end int             Only report variants before (and including) this position (default -1)
      --aggregate           Report the proportions of each change
      --threshold float     If --aggregate, only report changes with a freq greater than or equal to this value
      --append-snps         Report the codon's SNPs in parenthesis after each amino acid mutation
  -t, --threads int         Number of threads to use (default 1)
  -h, --help                help for variants
$ ./gofasta variants -a 7-WU-FF1.gff -r 7-WU-FF --msa 7-WU-FF.fa
Error: Error parsing gff SeqID: 7-WU-FF
Usage:
  gofasta variants [flags]

Flags:
      --msa string          Multiple sequence alignment in fasta format (default "stdin")
  -r, --reference string    The ID of the reference record in the msa
  -a, --annotation string   Genbank or GFF3 format annotation file. Must have suffix .gb or .gff
  -o, --outfile string      Name of the file of variants to write (default "stdout")
      --start int           Only report variants after (and including) this position (default -1)
      --end int             Only report variants before (and including) this position (default -1)
      --aggregate           Report the proportions of each change
      --threshold float     If --aggregate, only report changes with a freq greater than or equal to this value
      --append-snps         Report the codon's SNPs in parenthesis after each amino acid mutation
  -t, --threads int         Number of threads to use (default 1)
  -h, --help                help for variants
benjamincjackson commented 7 months ago

I guess this was a bit silly in retrospect, but as gofasta is now at version > 1.0.0, I don't want to introduce breaking changes to the API.

mmokrejs commented 7 months ago

At least improving the parsing out would be helpful. Took me ages to realize the message was about ID name and not fasta filename. And still you can introduce --reference-filename and respect it in as many places as possible. How many users are there around the world?