pepkit / peppy

Project metadata manager for PEPs in Python
https://pep.databio.org/peppy
BSD 2-Clause "Simplified" License
37 stars 13 forks source link

Allow a merged sample where some attributes are missing values #395

Closed rafalstepien closed 1 month ago

rafalstepien commented 2 years ago

I encountered a problem where one sample from sample_table had two runs: one for paired-end and second for single-end sequencing and received following error.

(databio) cgf8xr@cphg-fqvt2j3:~/databio/repos/pep-nextflow/pseudo_nextflow_task$ eido validate --st-index sample nextflow_files/samplesheet.csv -s samplesheet_schema.yaml -e
Found 1 samples with non-unique names: {'2612'}. Attempting to auto-merge.
Traceback (most recent call last):
  File "/home/cgf8xr/databio/venvs/databio/lib/python3.8/site-packages/attmap/pathex_attmap.py", line 39, in __getattr__
    v = super(PathExAttMap, self).__getattribute__(item)
AttributeError: 'Sample' object has no attribute 'fastq_2'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/cgf8xr/databio/venvs/databio/lib/python3.8/site-packages/attmap/ordattmap.py", line 46, in __getitem__
    return super(OrdAttMap, self).__getitem__(item)
KeyError: 'fastq_2'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/cgf8xr/databio/venvs/databio/lib/python3.8/site-packages/attmap/pathex_attmap.py", line 42, in __getattr__
    return self.__getitem__(item, expand)
  File "/home/cgf8xr/databio/venvs/databio/lib/python3.8/site-packages/attmap/pathex_attmap.py", line 59, in __getitem__
    v = super(PathExAttMap, self).__getitem__(item)
  File "/home/cgf8xr/databio/venvs/databio/lib/python3.8/site-packages/attmap/ordattmap.py", line 48, in __getitem__
    return AttMap.__getitem__(self, item)
  File "/home/cgf8xr/databio/venvs/databio/lib/python3.8/site-packages/attmap/attmap.py", line 32, in __getitem__
    return self.__dict__[item]
KeyError: 'fastq_2'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/cgf8xr/databio/venvs/databio/bin/eido", line 8, in <module>
    sys.exit(main())
  File "/home/cgf8xr/databio/venvs/databio/lib/python3.8/site-packages/eido/cli.py", line 104, in main
    p = Project(cfg=args.pep, sample_table_index=args.st_index)
  File "/home/cgf8xr/databio/venvs/databio/lib/python3.8/site-packages/peppy/project.py", line 138, in __init__
    self.create_samples(modify=False if self[SAMPLE_TABLE_FILE_KEY] else True)
  File "/home/cgf8xr/databio/venvs/databio/lib/python3.8/site-packages/peppy/project.py", line 164, in create_samples
    self._auto_merge_duplicated_names()
  File "/home/cgf8xr/databio/venvs/databio/lib/python3.8/site-packages/peppy/project.py", line 484, in _auto_merge_duplicated_names
    flatten([getattr(s, attr) for s in dup_samples])
  File "/home/cgf8xr/databio/venvs/databio/lib/python3.8/site-packages/peppy/project.py", line 484, in <listcomp>
    flatten([getattr(s, attr) for s in dup_samples])
  File "/home/cgf8xr/databio/venvs/databio/lib/python3.8/site-packages/attmap/pathex_attmap.py", line 46, in __getattr__
    raise AttributeError(item)
AttributeError: fastq_2

I think we must handle this problem.

Exemplary file: samplesheet.csv

According to the Nextflow people (providers of the sample table) this example is valid because "You can sequence the same library across different platforms and chemistries, so you could have different run types for one or different libraries of the same sample (this regularly happens in aDNA)".

nsheff commented 2 years ago

Ok, agreed.

Peppy should not choke on this or raise an error, but we should handle this and just add those attributes as appropriate.

rafalstepien commented 2 years ago

https://github.com/pepkit/peppy/pull/396

khoroshevskyi commented 1 month ago

After tests this issue seems to be fixed, closing