Closed kfuku52 closed 2 years ago
Here's the stdout
amalgkit integrate: start
checking SeqKit dependency
SeqKit dependency satisfied. Moving on.
found metadata
reading metadata from: ./metadata/metadata/metadata_03_curated_1900_01_01-2021_10_15.tsv
scanning for getfastq output
checking for getfastq outputs:
amalgkit getfastq output folder detected. Checking presence of output files.
Looking for DRR053705
getfastq safely_removed flag was detected. `amalgkit quant` has been completed in this sample: ./getfastq/DRR053705
./getfastq/DRR053705/DRR053705_2.amalgkit.fastq.gz.safely_removed
./getfastq/DRR053705/DRR053705_1.amalgkit.fastq.gz.safely_removed
This is a very curious error.
I'll need a bit more info to recreate that error on my end. Can you show me the amalgkit command
you ran?
I'm also curious if this happens for private getfastq files as well, or just public ones that have been safely removed.
The command was:
amalgkit integrate \
--out_dir ./ \
--fastq_dir ../local_fastq/${sp} \
--metadata ./metadata/metadata/metadata_03_curated_1900_01_01-2021_10_15.tsv \
--threads ${NSLOTS}
I'll share the metadata if it's difficult to reproduce on your end.
The command looks okay.
So, the public files are in ./getfastq/
, but are safely removed and the private files are in a different directory.
This should be enough info for me to reproduce.
I was able to reproduce the error.
Should be fixed in https://github.com/kfuku52/amalgkit/commit/5b8adf7c56d3351018089d4410c9151b7e1ae324
Another error appeared with the same command.
Traceback (most recent call last):
File "/home/kfuku/.pyenv/versions/miniconda3-4.3.30/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3080, in get_loc
return self._engine.get_loc(casted_key)
File "pandas/_libs/index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 4554, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 4562, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'avg_len'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/kfuku/.pyenv/versions/miniconda3-4.3.30/bin/amalgkit", line 378, in <module>
args.handler(args)
File "/home/kfuku/.pyenv/versions/miniconda3-4.3.30/bin/amalgkit", line 89, in command_integrate
integrate_main(args)
File "/home/kfuku/.pyenv/versions/miniconda3-4.3.30/lib/python3.8/site-packages/amalgkit/integrate.py", line 131, in integrate_main
tmp_metadata = get_fastq_stats(args)
File "/home/kfuku/.pyenv/versions/miniconda3-4.3.30/lib/python3.8/site-packages/amalgkit/integrate.py", line 95, in get_fastq_stats
tmp_metadata.loc[row,'spot_length']= tmp_stat_df.loc[0,'avg_len']
File "/home/kfuku/.pyenv/versions/miniconda3-4.3.30/lib/python3.8/site-packages/pandas/core/indexing.py", line 889, in __getitem__
return self._getitem_tuple(key)
File "/home/kfuku/.pyenv/versions/miniconda3-4.3.30/lib/python3.8/site-packages/pandas/core/indexing.py", line 1060, in _getitem_tuple
return self._getitem_lowerdim(tup)
File "/home/kfuku/.pyenv/versions/miniconda3-4.3.30/lib/python3.8/site-packages/pandas/core/indexing.py", line 831, in _getitem_lowerdim
return getattr(section, self.name)[new_key]
File "/home/kfuku/.pyenv/versions/miniconda3-4.3.30/lib/python3.8/site-packages/pandas/core/indexing.py", line 895, in __getitem__
return self._getitem_axis(maybe_callable, axis=axis)
File "/home/kfuku/.pyenv/versions/miniconda3-4.3.30/lib/python3.8/site-packages/pandas/core/indexing.py", line 1124, in _getitem_axis
return self._get_label(key, axis=axis)
File "/home/kfuku/.pyenv/versions/miniconda3-4.3.30/lib/python3.8/site-packages/pandas/core/indexing.py", line 1073, in _get_label
return self.obj.xs(label, axis=axis)
File "/home/kfuku/.pyenv/versions/miniconda3-4.3.30/lib/python3.8/site-packages/pandas/core/generic.py", line 3738, in xs
loc = index.get_loc(key)
File "/home/kfuku/.pyenv/versions/miniconda3-4.3.30/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3082, in get_loc
raise KeyError(key) from err
KeyError: 'avg_len'
My best guess is that this has something to do with seqkit. avg_len is read from a temporary file, which is generated from seqkit output. But since amalgkit isn't terminated during execution of seqkit, I suspect this may be seqkit adjacent. Maybe a corrupted .tmp file, or some change in seqkit output due to an update, or something.
integrate.py
. This could give some insight as to why there is no ave_len
. I should implement the preservation of tmp files as a user option. os.remove(tmp_file)
Sorry but I have no memory of any bugs from three months ago... Did you reproduce the error?
Yeah, not your fault, of course!
integrate
seems to work fine for me, so this may be particular to a certain SRA-ID or FASTQ file, or seqkit installation.
I will implement an option for keeping the seqkit output, in case this comes up again in the future.
Never closed this. The argument to keep the output is --remove_tmp yes|no
amalgkit integrate
returned this error. I think this error does not depend on the input sample type. @Hego-CCTB Could you take a look?