pepkit / peppy

Project metadata manager for PEPs in Python
https://pep.databio.org/peppy
BSD 2-Clause "Simplified" License
37 stars 13 forks source link

Peppy does not create correct list of `Sample()` objects stored in `self._samples` #408

Closed rafalstepien closed 1 month ago

rafalstepien commented 2 years ago

Peppy does not create correct list of Sample() objects stored in self._samples.

If there is a column in sample_table.csv or subsample_table.csv that is empty for any row, then peppy does not create Sample() object with this attribute. The problem is that this is not true, and Sample() instance should have this attribute with value None. To better explain:

Given following sample_table.csv:

| sample_name | fastq_1          | fastq_2          |
|-------------|------------------|------------------|
| name_1      | /path/to_fastq_1 | /path/to_fastq_2 |
| name_2      | /path/to_fastq_1 |                  |
| name_3      | /path/to_fastq_1 | /path/to_fastq_2 |

peppy will create the following structure:

self._samples = [
    Sample(
        sample: name_1
        fastq_1:
            -  /path/to_fastq_1
        fastq_2:
            -  /path/to_fastq_2
    ),
    Sample(
        sample: name_2
        fastq_1:
            -  /path/to_fastq_1
    ),
    Sample(
        sample: name_3
        fastq_1:
            -  /path/to_fastq_1
        fastq_2:
            -  /path/to_fastq_2
    ),
]

And any attempt to access self._samples[1].fastq_2 will raise an attribute error. In my opinion the correct structure should be:

self._samples = [
    Sample(
        sample: name_1
        fastq_1:
            -  /path/to_fastq_1
        fastq_2:
            -  /path/to_fastq_2
    ),
    Sample(
        sample: name_2
        fastq_1:
            -  /path/to_fastq_1
        fastq_2:
            - None
    ),
    Sample(
        sample: name_3
        fastq_1:
            -  /path/to_fastq_1
        fastq_2:
            -  /path/to_fastq_2
    ),
]
vreuter commented 2 years ago

Hi! Just a reminder to in general not rely on access to attributes prefixed with _ since the correctness is often less thoroughly vetted and they're more likely to change, but thanks for catching this and opening a ticket @rafalstepien !

rafalstepien commented 2 years ago

Hi @vreuter thanks for the comment. You are absolutely right. I mentioned self._samples but I meant self.samples. If you look to the code, accessing self.samples simply returns self._samples if it is present, that's why I used the shortcut and went directly for _samples.

khoroshevskyi commented 1 month ago

This issue was solved in https://github.com/pepkit/peppy/pull/409