jdidion / atropos

An NGS read trimming tool that is specific, sensitive, and speedy. (production)
Other
120 stars 15 forks source link

Trimming fails #71

Closed parkerac closed 4 years ago

parkerac commented 6 years ago

I've been running atropos on RNA-seq data, and it has worked for most of the samples, but failed for about 1/10th of them (the command and output are below). I can't seem to find any documentation about this error. Would you be able to provide some insight about what I should do from here?

atropos -a file:adapters.fasta -q 10 -o ${OUTNAME}_output.fastq -se ${FILENAME}

2018-07-20 08:11:43,963 INFO: This is Atropos 1.1.18 with Python 3.6.5 2018-07-20 08:11:44,019 ERROR: Error executing command trim Traceback (most recent call last): File "/homedir/lib/python3.6/site-packages/atropos/commands/base.py", line 332, in run self.return_code = self() File "/homedir/lib/python3.6/site-packages/atropos/commands/trim/init.py", line 295, in call adapter_cache = super().load_known_adapters() File "/homedir/lib/python3.6/site-packages/atropos/commands/base.py", line 370, in load_known_adapters adapter_cache = AdapterCache(cache_file) File "/homedir/lib/python3.6/site-packages/atropos/adapters/init.py", line 760, in init self.seq_to_name, self.name_to_seq = pickle.load(cache) _pickle.UnpicklingError: pickle data was truncated

jdidion commented 6 years ago

Thanks for reporting this. Have you determined if this error is deterministic (i.e. always happens for the same samples)? If so, could you please provide a minimal dataset that reproduces the error?

parkerac commented 6 years ago

Previously, I was running the files through atropos individually, but I realized that I should be using the paired-end mode. However, I am still getting the same errors. One error is "_pickle.UnpicklingError: pickle data was truncated," and the other error is "EOFError: Ran out of input."

This pair of files created the unpickling error: https://byu.box.com/s/l4msvlptpenngkb3h6k2m7ednr4td64o https://byu.box.com/s/l7iovhax652wvv7jbqlc6nkcbk4xs67a

This pair of files created the EOF error: https://byu.box.com/s/6ibjhuzt361vtcxjdr601bx86h528sra https://byu.box.com/s/08ldgnfigzwbrqccgbp4ppqpn2v2lyis

This was the command I used: atropos -T 4 -a file:adapters.fasta -q 10 -o ${OUTNAME}_output.fastq -p ${OUTNAME}_output.fastq -pe1 ${FILENAME} -pe2 ${FILENAME2}

Thank you for looking into this!

jdidion commented 6 years ago

Hi @parkerac, apologies but I have not had a chance to look at this until now, and it seems those files are no longer available. Could you please share them again? Thanks!

jdidion commented 4 years ago

@parkerac you can also try the new 2.x release of Atropos. I believe I've fixed the issue.

justinshaffer commented 4 years ago

@jdidion - can you please provide details on how to install this new version? When trying pip I'm only able to install 1.1.24. I'm receiving a similar error with this version:

2020-01-10 09:39:58,427 INFO: This is Atropos 1.1.24 with Python 3.6.7 2020-01-10 09:39:58,433 ERROR: Error executing command trim Traceback (most recent call last): File "/home/jpshaffer/software/miniconda3/envs/qiime2-2019.10/lib/python3.6/site-packages/atropos/commands/base.py", line 332, in run self.return_code = self() File "/home/jpshaffer/software/miniconda3/envs/qiime2-2019.10/lib/python3.6/site-packages/atropos/commands/trim/init.py", line 295, in call adapter_cache = super().load_known_adapters() File "/home/jpshaffer/software/miniconda3/envs/qiime2-2019.10/lib/python3.6/site-packages/atropos/commands/base.py", line 370, in load_known_adapters adapter_cache = AdapterCache(cache_file) File "/home/jpshaffer/software/miniconda3/envs/qiime2-2019.10/lib/python3.6/site-packages/atropos/adapters/init.py", line 760, in init self.seq_to_name, self.name_to_seq = pickle.load(cache) EOFError: Ran out of input

jdidion commented 4 years ago

@justinshaffer right now 2.x is in pre-release, so you have to use the --pre option with pip.

justinshaffer commented 4 years ago

Thanks @jdidion!

I ran into the following error when trying your suggestion:

$ pip install --pre atropos Collecting atropos Using cached https://files.pythonhosted.org/packages/82/a2/9f1cd425174848cd85a9fbf58b5f35d98e0db0f8868c6516c567d8befc6e/atropos-2.0.0a2.tar.gz ERROR: Command errored out with exit status 1: command: /home/jpshaffer/software/miniconda3/envs/shotgun_processing/bin/python -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-87rfjh42/atropos/setup.py'"'"'; file='"'"'/tmp/pip-install-87rfjh42/atropos/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-install-87rfjh42/atropos/pip-egg-info cwd: /tmp/pip-install-87rfjh42/atropos/ Complete output (5 lines): Traceback (most recent call last): File "", line 1, in File "/tmp/pip-install-87rfjh42/atropos/setup.py", line 122, in Path(file).parent.absolute() / "README.md", encoding="utf-8" FileNotFoundError: [Errno 2] No such file or directory: '/tmp/pip-install-87rfjh42/atropos/README.md'

ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

jdidion commented 4 years ago

Sorry about that. It looks like I missed making some updates to the MANIFEST. It's fixed now and I've pushed a new build to pypi (2.0.0-alpha.3).

justinshaffer commented 4 years ago

Thanks! I was able to install but am now running into errors related to my script - I see that the --nextseq-trim parameter is not supported so I removed it - here is what I'm running now:

atropos \ -a GGGGGGGGGG \ -A GGGGGGGGGG \ -pe1 /projects/emp500/02-shotgun/analysis_justin/data/01_atropos_adapters/Berry2_sediment_Pi31_S415_L002_R1_001_atropos_adapters.fastq.gz \ -pe2 /projects/emp500/02-shotgun/analysis_justin/data/01_atropos_adapters/Berry2_sediment_Pi31_S415_L002_R2_001_atropos_adapters.fastq.gz \ -o /projects/emp500/02-shotgun/analysis_justin/data/02_atropos_polyg/Berry2_sediment_Pi31_S415_L002_R1_001_atropos_adapters_polyg.fastq.gz \ -p /projects/emp500/02-shotgun/analysis_justin/data/02_atropos_polyg/Berry2_sediment_Pi31_S415_L002_R2_001_atropos_adapters_polyg.fastq.gz \ -e 0.1 \ -q 15 \ --insert-match-error-rate 0.2 \ --minimum-length 100 \ --pair-filter any \ --report-file /projects/emp500/02-shotgun/analysis_justin/data/02_atropos_polyg/00_atropos_logs/atropos_log_Berry2_sediment_Pi31_S415_L002.txt \ --report-formats txt \ -T 16

And here is the error I received:

/home/jpshaffer/software/miniconda3/envs/shotgun_processing/lib/python3.6/site-packages/xphyle/paths.py:149: DeprecationWarning: Use of resolve_path with string path arguments is deprected (lineno /home/jpshaffer/software/miniconda3/envs/shotgun_processing/lib/python3.6/argparse.py:2265) f"Use of {func.name} with string path arguments is " /home/jpshaffer/software/miniconda3/envs/shotgun_processing/lib/python3.6/site-packages/xphyle/paths.py:149: DeprecationWarning: Use of resolve_path with string path arguments is deprected (lineno /home/jpshaffer/software/miniconda3/envs/shotgun_processing/lib/python3.6/argparse.py:2265) f"Use of {func.name} with string path arguments is " 2020-01-10 12:53:25.748 | INFO | atropos.commands.console:_setup_logging:247 - This is Atropos 2.0.0a3 with Python 3.6.7 %(asctime)s %(levelname)s: %(message)s 2020-01-10 12:53:25.791 | ERROR | atropos.console:execute_cli:151 - Error executing command: trim Traceback (most recent call last): File "/home/jpshaffer/software/miniconda3/envs/shotgun_processing/bin/atropos", line 8, in sys.exit(main()) │ │ └ <function main at 0x7f8a78742ae8> │ └ └ <module 'sys' (built-in)> File "/home/jpshaffer/software/miniconda3/envs/shotgun_processing/lib/python3.6/site-packages/atropos/main.py", line 17, in main sys.exit(run_atropos(args)) │ │ │ └ None │ │ └ <function run_atropos at 0x7f8a787341e0> │ └ └ <module 'sys' (built-in)> File "/home/jpshaffer/software/miniconda3/envs/shotgun_processing/lib/python3.6/site-packages/atropos/console.py", line 56, in run_atropos return execute_cli(args) │ └ ['-a', 'GGGGGGGGGG', '-A', 'GGGGGGGGGG', '-pe1', '/projects/emp500/02-shotgun/analysis_justin/data/01_atropos_adapters/Berry2... └ <function execute_cli at 0x7f8a6ffa9400>

File "/home/jpshaffer/software/miniconda3/envs/shotgun_processing/lib/python3.6/site-packages/atropos/console.py", line 144, in execute_cli retcode, summary = command.execute(args) │ │ └ ['-a', 'GGGGGGGGGG', '-A', 'GGGGGGGGGG', '-pe1', '/projects/emp500/02-shotgun/analysis_justin/data/01_atropos_adapters/Berry2... │ └ <classmethod object at 0x7f8a6ffa3898> └ atropos.commands.trim.console.TrimCommandConsole File "/home/jpshaffer/software/miniconda3/envs/shotgun_processing/lib/python3.6/site-packages/atropos/commands/console.py", line 142, in execute options = cls._parse_args(args) │ │ └ ['-a', 'GGGGGGGGGG', '-A', 'GGGGGGGGGG', '-pe1', '/projects/emp500/02-shotgun/analysis_justin/data/01_atropos_adapters/Berry2... │ └ <classmethod object at 0x7f8a6ffa38d0> └ atropos.commands.trim.console.TrimCommandConsole File "/home/jpshaffer/software/miniconda3/envs/shotgun_processing/lib/python3.6/site-packages/atropos/commands/console.py", line 170, in _parse_args cls._validate_options(options, parser) │ │ │ └ AtroposArgumentParser(prog='atropos trim', usage='\n atropos trim -a ADAPTER [options] [-o output.fastq] -se input.fas... │ │ └ Namespace(accession=None, action=<TrimAction.TRIM: 'trim'>, adapter_cache_file=PosixPath('/home/jpshaffer/.adapters'), adapte... │ └ <classmethod object at 0x7f8a6fe7b278> └ atropos.commands.trim.console.TrimCommandConsole File "/home/jpshaffer/software/miniconda3/envs/shotgun_processing/lib/python3.6/site-packages/atropos/commands/trim/console.py", line 111, in _validate_options cls._validate_trim_options(options, parser) │ │ │ └ AtroposArgumentParser(prog='atropos trim', usage='\n atropos trim -a ADAPTER [options] [-o output.fastq] -se input.fas... │ │ └ Namespace(accession=None, action=<TrimAction.TRIM: 'trim'>, adapter_cache_file=PosixPath('/home/jpshaffer/.adapters'), adapte... │ └ <staticmethod object at 0x7f8a6fe7b2e8> └ atropos.commands.trim.console.TrimCommandConsole File "/home/jpshaffer/software/miniconda3/envs/shotgun_processing/lib/python3.6/site-packages/atropos/commands/trim/console.py", line 1219, in _validate_trim_options options.can_use_system_compression = fmt.can_use_system_compression() │ │ │ └ <property object at 0x7f8a7018e7c8> │ │ └ <xphyle.formats.Gzip object at 0x7f8a7018ad68> │ └ False └ Namespace(accession=None, action=<TrimAction.TRIM: 'trim'>, adapter_cache_file=PosixPath('/home/jpshaffer/.adapters'), adapte...

TypeError: 'bool' object is not callable %(asctime)s %(levelname)s: %(message)s Traceback (most recent call last): File "/home/jpshaffer/software/miniconda3/envs/shotgun_processing/bin/atropos", line 8, in sys.exit(main()) │ │ └ <function main at 0x7f8a78742ae8> │ └ └ <module 'sys' (built-in)> File "/home/jpshaffer/software/miniconda3/envs/shotgun_processing/lib/python3.6/site-packages/atropos/main.py", line 17, in main sys.exit(run_atropos(args)) │ │ │ └ None │ │ └ <function run_atropos at 0x7f8a787341e0> │ └ └ <module 'sys' (built-in)> File "/home/jpshaffer/software/miniconda3/envs/shotgun_processing/lib/python3.6/site-packages/atropos/console.py", line 56, in run_atropos return execute_cli(args) │ └ ['-a', 'GGGGGGGGGG', '-A', 'GGGGGGGGGG', '-pe1', '/projects/emp500/02-shotgun/analysis_justin/data/01_atropos_adapters/Berry2... └ <function execute_cli at 0x7f8a6ffa9400>

File "/home/jpshaffer/software/miniconda3/envs/shotgun_processing/lib/python3.6/site-packages/atropos/console.py", line 144, in execute_cli retcode, summary = command.execute(args) │ │ └ ['-a', 'GGGGGGGGGG', '-A', 'GGGGGGGGGG', '-pe1', '/projects/emp500/02-shotgun/analysis_justin/data/01_atropos_adapters/Berry2... │ └ <classmethod object at 0x7f8a6ffa3898> └ atropos.commands.trim.console.TrimCommandConsole File "/home/jpshaffer/software/miniconda3/envs/shotgun_processing/lib/python3.6/site-packages/atropos/commands/console.py", line 142, in execute options = cls._parse_args(args) │ │ └ ['-a', 'GGGGGGGGGG', '-A', 'GGGGGGGGGG', '-pe1', '/projects/emp500/02-shotgun/analysis_justin/data/01_atropos_adapters/Berry2... │ └ <classmethod object at 0x7f8a6ffa38d0> └ atropos.commands.trim.console.TrimCommandConsole File "/home/jpshaffer/software/miniconda3/envs/shotgun_processing/lib/python3.6/site-packages/atropos/commands/console.py", line 170, in _parse_args cls._validate_options(options, parser) │ │ │ └ AtroposArgumentParser(prog='atropos trim', usage='\n atropos trim -a ADAPTER [options] [-o output.fastq] -se input.fas... │ │ └ Namespace(accession=None, action=<TrimAction.TRIM: 'trim'>, adapter_cache_file=PosixPath('/home/jpshaffer/.adapters'), adapte... │ └ <classmethod object at 0x7f8a6fe7b278> └ atropos.commands.trim.console.TrimCommandConsole File "/home/jpshaffer/software/miniconda3/envs/shotgun_processing/lib/python3.6/site-packages/atropos/commands/trim/console.py", line 111, in _validate_options cls._validate_trim_options(options, parser) │ │ │ └ AtroposArgumentParser(prog='atropos trim', usage='\n atropos trim -a ADAPTER [options] [-o output.fastq] -se input.fas... │ │ └ Namespace(accession=None, action=<TrimAction.TRIM: 'trim'>, adapter_cache_file=PosixPath('/home/jpshaffer/.adapters'), adapte... │ └ <staticmethod object at 0x7f8a6fe7b2e8> └ atropos.commands.trim.console.TrimCommandConsole File "/home/jpshaffer/software/miniconda3/envs/shotgun_processing/lib/python3.6/site-packages/atropos/commands/trim/console.py", line 1219, in _validate_trim_options options.can_use_system_compression = fmt.can_use_system_compression() │ │ │ └ <property object at 0x7f8a7018e7c8> │ │ └ <xphyle.formats.Gzip object at 0x7f8a7018ad68> │ └ False └ Namespace(accession=None, action=<TrimAction.TRIM: 'trim'>, adapter_cache_file=PosixPath('/home/jpshaffer/.adapters'), adapte...

TypeError: 'bool' object is not callable

Any input regarding the error would be super helpful. Thanks in advance

jdidion commented 4 years ago

It looks like you've discovered a bug - I'll work on debugging it.

Also, please check out the change list: https://github.com/jdidion/atropos/blob/develop/CHANGES.md

The --nextseq-trim option has changed to --twocolor-trim.

jdidion commented 4 years ago

I fixed that issue and released a new build (2.0.0-alpha.4). Please try again.

If you run into more issues, it might be faster to give me a minimal dataset. That way I can run the same command that you're running and work through any issues without having to go back-and-forth each time.

justinshaffer commented 4 years ago

Thanks @jdidion!

I want to take a step back and try to address my first error when using version 1.1.24, as I feel you may be able to address my problem.

I'm most puzzled by the error, because I was able to successfully process the same files using the script previously. I ran into space limitations on our server, which caused the jobs to fail. After obtaining more disk space, I attempted to re-run, which is when I ran into the error that I've copied again below.

I wonder if there are some temporary files, or things being written to a location other than the output location I specified in my script, that are causing the error? It seems I only get the error when attempting to process files that I already have previously, but I have not tested this yet.

Do you have any ideas or thoughts along these lines? Thanks in advance.

What I ran:

atropos \ -a GGGGGGGGGG \ -A GGGGGGGGGG \ -pe1 /sequencing/ucsd/complete_runs/191119_A00953_0026_BHW77GDSXX/Extraction_Test_Nextera_XT_Flex/stool_human_A_1_standard_H_XT_S97_L001_R1_001.fastq.gz \ -pe2 /sequencing/ucsd/complete_runs/191119_A00953_0026_BHW77GDSXX/Extraction_Test_Nextera_XT_Flex/stool_human_A_1_standard_H_XT_S97_L001_R2_001.fastq.gz \ -o /home/jpshaffer/illumina/xt_flex_khp_round02/data/atropos_polyg/nextera_xt/stool_human_A_1_standard_H_XT_S97_L001_R1_001_atropos_polyg.fastq.gz \ -p /home/jpshaffer/illumina/xt_flex_khp_round02/data/atropos_polyg/nextera_xt/stool_human_A_1_standard_H_XT_S97_L001_R2_001_atropos_polyg.fastq.gz \ --nextseq-trim 1 \ -e 0.1 \ -q 15 \ --insert-match-error-rate 0.2 \ --minimum-length 100 \ --pair-filter any \ --report-file /home/jpshaffer/illumina/xt_flex_khp_round02/data/atropos_polyg/nextera_xt/atropos_log_stool_human_A_1_standard_H_XT_S97_L001.txt \ --report-formats txt \ -T 16

The error:

2020-01-10 13:56:10,600 INFO: This is Atropos 1.1.24 with Python 3.6.8 2020-01-10 13:56:10,609 ERROR: Error executing command trim Traceback (most recent call last): File "/home/jpshaffer/software/miniconda3/envs/multiqc/lib/python3.6/site-packages/atropos/commands/base.py", line 332, in run self.return_code = self() File "/home/jpshaffer/software/miniconda3/envs/multiqc/lib/python3.6/site-packages/atropos/commands/trim/init.py", line 295, in call adapter_cache = super().load_known_adapters() File "/home/jpshaffer/software/miniconda3/envs/multiqc/lib/python3.6/site-packages/atropos/commands/base.py", line 370, in load_known_adapters adapter_cache = AdapterCache(cache_file) File "/home/jpshaffer/software/miniconda3/envs/multiqc/lib/python3.6/site-packages/atropos/adapters/init.py", line 760, in init self.seq_to_name, self.name_to_seq = pickle.load(cache) EOFError: Ran out of input

jdidion commented 4 years ago

The error is due to trying to load a corrupted adapter cache file. I added code to handle this in the develop branch, which is why I suggested trying out the 2.0.0* build. But I've also just back-ported the fix to the 1.1.x branch and released a new version (1.1.25). Please try it out.

justinshaffer commented 4 years ago

Thanks, @jdidion. I sincerely appreciate that.

Just curious - any idea how that file gets corrupted? Does it have to do with re-processing - or perhaps when jobs are killed intermediately?

Thanks in advance

jdidion commented 4 years ago

There could be a couple ways - the job gets killed while writing the file, the file was written by a newer version of python than is used to read it (for e.g. you change python versions between running the application). I don't think it's a multi-threading issue, but I will review the code to make sure.

jdidion commented 4 years ago

I am closing this issue. Please re-open if you still experience the problem using the new version.