nf-core / funcscan

(Meta-)genome screening for functional and natural product gene sequences
https://nf-co.re/funcscan
MIT License
74 stars 20 forks source link

DRAMP download error #423

Open jasmezz opened 1 month ago

jasmezz commented 1 month ago

EDIT: PLEASE SEE https://github.com/nf-core/funcscan/issues/423#issuecomment-2472437597 FOR LATEST STATUS

Description of the bug

The DRAMP database issue reported by J.C. Ruitenberg (https://nfcore.slack.com/archives/C02K5GX2W93/p1728552583863299) seems to affect the DRAMP download and processing permanently since it was updated on September 8 (see: http://dramp.cpu-bioinfor.org/static/update.php) – 3 days after funcscan 2.0.0 release.

I reported the problem already to the DRAMP maintainers; if this is not going to be solved soon, the DRAMP download script bin/ampcombi_download.py should be updated on our side to not crash if non-ascii-characters are found in the database sheet during parsing.

Example error here: https://github.com/nf-core/funcscan/actions/runs/11293369456/job/31411353829?pr=422

Command used and terminal output

No response

Relevant files

No response

System information

No response

jasmezz commented 1 month ago

Still no reply from DRAMP maintainers (neither to my e-mail nor via uploading a fixed DRAMP database). I think we should go ahead and adapt the DRAMP download script, @Darcy220606 👾 If you check their homepage, they have it under CC-BY license. Meaning we could upload it on our side (Zenodo?) to track versions and provide a working DRAMP version. Either:

jasmezz commented 3 weeks ago

Update: DRAMP maintainers fixed the reported error (removed invalid 工 character from sequence DRAMP31926) and updated the database online. However, it turned out that other amino acid sequences contain invalid characters as well (see screenshot). Waiting for updates on that. In any case, @Darcy220606 will adapt the DRAMP download script to filter such AMPs out to not have the pipeline crash when parsing the database.

Image

jfy133 commented 3 weeks ago

Where on early are these characters coming from?!

It's sort of making me question the reliability of the database...

swelbo commented 1 week ago

Hello - has there been any progress with this?

I'm getting this error when running:

nextflow run nf-core/funcscan -profile test,docker --outdir ~/path/to/test

I assume this is the same issues.

#######

Command error: Traceback (most recent call last): File "/home/harry/.nextflow/assets/nf-core/funcscan/bin/ampcombi_download.py", line 78, in download_DRAMP("amp_ref_database") File "/home/harry/.nextflow/assets/nf-core/funcscan/bin/ampcombi_download.py", line 49, in download_DRAMP for record in seq_record: File "/usr/local/lib/python3.11/site-packages/Bio/SeqIO/Interfaces.py", line 72, in next return next(self.records) ^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/Bio/SeqIO/FastaIO.py", line 246, in iterate Seq(sequence), id=first_word, name=first_word, description=title ^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/Bio/Seq.py", line 2034, in init self._data = bytes(data, encoding="ASCII") ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ UnicodeEncodeError: 'ascii' codec can't encode character '\u03a6' in position 1: ordinal not in range(128)

Darcy220606 commented 1 week ago

Hi @swelbo Indeed, we are currently working on a fix for ampcombi, so it should be fixed in the coming two weeks. :)