Closed AroArz closed 10 months ago
It does not impact the downstream pipline. If the filename is empty (NA) the default path in atlas is used and the pipeline should work.
However, it would still be better to write the correct names there. You said there are only some NA rows. So you know the pattern for imputing them.
I keep this issue open, and try to fix it in a later version.
I'll continue writing here. I'm rerunning atlas, this time with many more samples and atlas is crashing on qcreads
complaining about empty values in BinGroup. I've specified about 10 bingroups titled
"BG1", "BG2" ... "BG10", "BGmock"
There are no NaNs in this column and no empty strings. BinGroups are <150 in size.
Help appreciated.
Occasionally it will also produce the following error
Error in rule qcreads:
jobid: 0
input: S866/sequence_quality_control/S866_clean_R1.fastq.gz, S866/sequence_quality_control/S866_clean_R2.fastq.gz, S866/sequence_quality_control/S866_clean_s
e.fastq.gz
output: S866/sequence_quality_control/S866_QC_R1.fastq.gz, S866/sequence_quality_control/S866_QC_R2.fastq.gz, S866/sequence_quality_control/S866_QC_se.fastq.
gz
RuleException:
EmptyDataError in file /crex/proj/snic2020-6-233/envs/atlas2/lib/python3.10/site-packages/atlas/workflow/rules/qc.smk, line 440:
No columns to parse from file
File "/crex/proj/snic2020-6-233/envs/atlas2/lib/python3.10/site-packages/atlas/workflow/rules/qc.smk", line 440, in __rule_qcreads
File "/crex/proj/snic2020-6-233/envs/atlas2/lib/python3.10/site-packages/atlas/sample_table.py", line 64, in load_sample_table
File "/crex/proj/snic2020-6-233/envs/atlas2/lib/python3.10/site-packages/pandas/util/_decorators.py", line 311, in wrapper
File "/crex/proj/snic2020-6-233/envs/atlas2/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 678, in read_csv
File "/crex/proj/snic2020-6-233/envs/atlas2/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 575, in _read
File "/crex/proj/snic2020-6-233/envs/atlas2/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 932, in __init__
File "/crex/proj/snic2020-6-233/envs/atlas2/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1234, in _make_engine
File "/crex/proj/snic2020-6-233/envs/atlas2/lib/python3.10/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 75, in __init__
File "pandas/_libs/parsers.pyx", line 551, in pandas._libs.parsers.TextReader.__cinit__
File "/crex/proj/snic2020-6-233/envs/atlas2/lib/python3.10/concurrent/futures/thread.py", line 58, in run
I guess where the bug is. can you send me the full sample.tsv please?
Sent! When I started the run for the first time, atlas had generated a short string in a new row at the end of the csv which I believe caused the first error, so I promptly removed it. After each qcread
errror, if I restart the pipeline, the sample which produced the error is resubmitted and completed without errors. Errors however seem to occur for many samples, I’ve restarted it about 10 times so far and it is progressing, slowly.
the step qcread
essentially, copies the input files to the output files.
And adds the files to the sample.tsv. However multiple threads reading / writing the sample.tsv cause errors.
I suggest to use this script to move the files yourself.
then delete the .snakemake folder and run atlas run qc
#!/bin/bash
set -e
# Get a list of all samples with clean_R1 files
samples=$(find . -type d -name "*clean_R1*.fastq.gz" | cut -d/ -f2)
# For each sample, move the input files to the output files
for sample in $samples; do
for fraction in Rr R2 see; do
cp -v "$sample/sequence_quality_control/${sample}_clean_${fraction}.fastq.gz" "$sample/sequence_quality_control/${sample}_QC_${fraction}.fastq.gz"
done
done
I will then send you a correct sample.tsv
fixed by atlas v 2.18.1
Hello Silas. I was able to run atlas 2.18.0 however I noticed for some of my samples in
samples.tsv
there are NaNs in the following columns.Reads_QC_R1
Reads_QC_R2
Reads_QC_se
Not really sure what this means, as the files do exist at corresponding paths for other files. All other columns are filled out appropriately. Writing to see if there is perhaps something I missed and whether this would've had any affect on the downstreams processing. I have QC stats, assemblies, bins and mapping counts for these samples so I'm abit confused. Thanks!
Atlas version 2.18.0 Additional context Add any other context about the problem here.