kfuku52 / amalgkit

RNA-seq data amalgamation for a large-scale evolutionary transcriptomics
BSD 3-Clause "New" or "Revised" License
7 stars 1 forks source link

AttributeError: 'Namespace' object has no attribute 'updated_metadata_dir' #82

Closed kfuku52 closed 1 year ago

kfuku52 commented 2 years ago

amalgkit integrate returned this error. I think this error does not depend on the input sample type. @Hego-CCTB Could you take a look?

Traceback (most recent call last):
  File "/home/kfuku/.pyenv/versions/miniconda3-4.3.30/bin/amalgkit", line 378, in <module>
    args.handler(args)
  File "/home/kfuku/.pyenv/versions/miniconda3-4.3.30/bin/amalgkit", line 89, in command_integrate
    integrate_main(args)
  File "/home/kfuku/.pyenv/versions/miniconda3-4.3.30/lib/python3.8/site-packages/amalgkit/integrate.py", line 104, in integrate_main
    data_available, data_unavailable = check_getfastq_outputs(args, sra_ids, metadata, args.out_dir)
  File "/home/kfuku/.pyenv/versions/miniconda3-4.3.30/lib/python3.8/site-packages/amalgkit/sanity.py", line 69, in check_getfastq_outputs
    if args.updated_metadata_dir:
AttributeError: 'Namespace' object has no attribute 'updated_metadata_dir'
kfuku52 commented 2 years ago

Here's the stdout

amalgkit integrate: start
checking SeqKit dependency
SeqKit dependency satisfied. Moving on.
found metadata 

reading metadata from: ./metadata/metadata/metadata_03_curated_1900_01_01-2021_10_15.tsv
scanning for getfastq output
checking for getfastq outputs: 
amalgkit getfastq output folder detected. Checking presence of output files.

Looking for  DRR053705
getfastq safely_removed flag was detected. `amalgkit quant` has been completed in this sample: ./getfastq/DRR053705
./getfastq/DRR053705/DRR053705_2.amalgkit.fastq.gz.safely_removed
./getfastq/DRR053705/DRR053705_1.amalgkit.fastq.gz.safely_removed
Hego-CCTB commented 2 years ago

This is a very curious error.

I'll need a bit more info to recreate that error on my end. Can you show me the amalgkit command you ran?

I'm also curious if this happens for private getfastq files as well, or just public ones that have been safely removed.

kfuku52 commented 2 years ago

The command was:

amalgkit integrate \
--out_dir ./ \
--fastq_dir ../local_fastq/${sp} \
--metadata ./metadata/metadata/metadata_03_curated_1900_01_01-2021_10_15.tsv \
--threads ${NSLOTS}

I'll share the metadata if it's difficult to reproduce on your end.

Hego-CCTB commented 2 years ago

The command looks okay. So, the public files are in ./getfastq/, but are safely removed and the private files are in a different directory.

This should be enough info for me to reproduce.

Hego-CCTB commented 2 years ago

I was able to reproduce the error.

Should be fixed in https://github.com/kfuku52/amalgkit/commit/5b8adf7c56d3351018089d4410c9151b7e1ae324

kfuku52 commented 2 years ago

Another error appeared with the same command.

Traceback (most recent call last):
  File "/home/kfuku/.pyenv/versions/miniconda3-4.3.30/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3080, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas/_libs/index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 4554, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 4562, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'avg_len'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/kfuku/.pyenv/versions/miniconda3-4.3.30/bin/amalgkit", line 378, in <module>
    args.handler(args)
  File "/home/kfuku/.pyenv/versions/miniconda3-4.3.30/bin/amalgkit", line 89, in command_integrate
    integrate_main(args)
  File "/home/kfuku/.pyenv/versions/miniconda3-4.3.30/lib/python3.8/site-packages/amalgkit/integrate.py", line 131, in integrate_main
    tmp_metadata = get_fastq_stats(args)
  File "/home/kfuku/.pyenv/versions/miniconda3-4.3.30/lib/python3.8/site-packages/amalgkit/integrate.py", line 95, in get_fastq_stats
    tmp_metadata.loc[row,'spot_length']= tmp_stat_df.loc[0,'avg_len']
  File "/home/kfuku/.pyenv/versions/miniconda3-4.3.30/lib/python3.8/site-packages/pandas/core/indexing.py", line 889, in __getitem__
    return self._getitem_tuple(key)
  File "/home/kfuku/.pyenv/versions/miniconda3-4.3.30/lib/python3.8/site-packages/pandas/core/indexing.py", line 1060, in _getitem_tuple
    return self._getitem_lowerdim(tup)
  File "/home/kfuku/.pyenv/versions/miniconda3-4.3.30/lib/python3.8/site-packages/pandas/core/indexing.py", line 831, in _getitem_lowerdim
    return getattr(section, self.name)[new_key]
  File "/home/kfuku/.pyenv/versions/miniconda3-4.3.30/lib/python3.8/site-packages/pandas/core/indexing.py", line 895, in __getitem__
    return self._getitem_axis(maybe_callable, axis=axis)
  File "/home/kfuku/.pyenv/versions/miniconda3-4.3.30/lib/python3.8/site-packages/pandas/core/indexing.py", line 1124, in _getitem_axis
    return self._get_label(key, axis=axis)
  File "/home/kfuku/.pyenv/versions/miniconda3-4.3.30/lib/python3.8/site-packages/pandas/core/indexing.py", line 1073, in _get_label
    return self.obj.xs(label, axis=axis)
  File "/home/kfuku/.pyenv/versions/miniconda3-4.3.30/lib/python3.8/site-packages/pandas/core/generic.py", line 3738, in xs
    loc = index.get_loc(key)
  File "/home/kfuku/.pyenv/versions/miniconda3-4.3.30/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3082, in get_loc
    raise KeyError(key) from err
KeyError: 'avg_len'
Hego-CCTB commented 2 years ago

My best guess is that this has something to do with seqkit. avg_len is read from a temporary file, which is generated from seqkit output. But since amalgkit isn't terminated during execution of seqkit, I suspect this may be seqkit adjacent. Maybe a corrupted .tmp file, or some change in seqkit output due to an update, or something.

                    os.remove(tmp_file)
kfuku52 commented 2 years ago

Sorry but I have no memory of any bugs from three months ago... Did you reproduce the error?

Hego-CCTB commented 2 years ago

Yeah, not your fault, of course! integrate seems to work fine for me, so this may be particular to a certain SRA-ID or FASTQ file, or seqkit installation.

I will implement an option for keeping the seqkit output, in case this comes up again in the future.

Hego-CCTB commented 1 year ago

Never closed this. The argument to keep the output is --remove_tmp yes|no