hallamlab / metapathways2

MetaPathways v2.0: A master-worker model for environmental Pathway/Genome Database construction on grids and clouds
http://hallam.microbiology.ubc.ca/MetaPathways/
33 stars 14 forks source link

Truncating nucleotide sequences with ambiguous nucleotides #90

Closed joshamilton closed 7 years ago

joshamilton commented 8 years ago

During pre-processing, MetaPathways shortens fasta sequences with ambiguous nucleotide codes. It appears to split the sequence at ambiguous nucleotides, and retains only the longest sub-sequence. This should be documented somewhere, and it would be nice to disable this behavior.

taltman commented 8 years ago

Funny, I just hit this bug the other day. The solution is to go into the file MetaPathways_filter_input.py, and comment out the line that calls the function filter_sequence:

# seqvalue = filter_sequence(seq)

Instead, just assign seqvalue to seq:

seqvalue = seq

I've informed the developers of the bug, and they said that they're working on it. They'd have it faster if they'd be open to a pull request... :-)

joshamilton commented 8 years ago

Thanks! Nice to see where to modify the existing code so I don't have to write a workaround!