pepkit / peppy

Project metadata manager for PEPs in Python
https://pep.databio.org/peppy
BSD 2-Clause "Simplified" License
37 stars 12 forks source link

peppy creates projects with no sample name column #476

Closed nsheff closed 6 days ago

nsheff commented 5 months ago

Related to #473

here's a csv file demo_fasta.csv:

assembly,local_file
demo0,data/demo/demo0.fasta
demo1,data/demo/demo1.fasta
demo2,data/demo/demo2.fasta
demo3,data/demo/demo3.fasta
demo4,data/demo/demo4.fasta
demo5,data/demo/demo5.fasta
demo6,data/demo/demo6.fasta

here's a pep yaml demo_fasta.yaml:

sample_annotation: demo_fasta.csv

watch this:

  1. Can't load a CSV file directly, because it has no sample_name column. This is correct:
p = peppy.Project("analysis/config/demo_fasta.csv")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/nsheff/.local/lib/python3.8/site-packages/peppy/project.py", line 163, in __init__
    self.create_samples(modify=False if self[SAMPLE_TABLE_FILE_KEY] else True)
  File "/home/nsheff/.local/lib/python3.8/site-packages/peppy/project.py", line 264, in create_samples
    self._assert_samples_have_names()
  File "/home/nsheff/.local/lib/python3.8/site-packages/peppy/project.py", line 561, in _assert_samples_have_names
    raise InvalidSampleTableFileException(message)
peppy.exceptions.InvalidSampleTableFileException: sample_table is missing 'sample_name' column; you must specify sample_tables in sample_name or derive them

BUT, it's not problem going through the yaml (that provides nothing other than a pointer to the CSV):

p = peppy.Project("analysis/config/demo_fasta.yaml")
Config file does not have version key. Defaulting to 2.1.0

This happily creates a project with no samples in it, despite having the annotation table:

p
Project
_config_file: analysis/config/demo_fasta.yaml
_sample_table_path: null
_subsample_tables_path: null
_config:
  sample_annotation: demo_fasta.csv
  pep_version: 2.1.0
st_index: sample_name
sst_index: 
 - sample_name
 - subsample_name
_samples: []
_samples_touched: False
is_private: False
progressbar: False
name: config
description: null
_sample_table: Empty DataFrame
Columns: []
Index: []
>>> p.samples
[]
nsheff commented 5 months ago

Interesting. The error is actually that I mis-specified the sample_table attribute as sample_annotation.

So, the problem is actually that it's not warning me of a missing sample_table, leading to my confusion. When I correct that error, using the config does give the error I expect:

p = peppy.Project("analysis/config/demo_fasta.yaml")
Config file does not have version key. Defaulting to 2.1.0
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/nsheff/.local/lib/python3.8/site-packages/peppy/project.py", line 163, in __init__
    self.create_samples(modify=False if self[SAMPLE_TABLE_FILE_KEY] else True)
  File "/home/nsheff/.local/lib/python3.8/site-packages/peppy/project.py", line 262, in create_samples
    self.modify_samples()
  File "/home/nsheff/.local/lib/python3.8/site-packages/peppy/project.py", line 438, in modify_samples
    self._assert_samples_have_names()
  File "/home/nsheff/.local/lib/python3.8/site-packages/peppy/project.py", line 561, in _assert_samples_have_names
    raise InvalidSampleTableFileException(message)
peppy.exceptions.InvalidSampleTableFileException: sample_table is missing 'sample_name' column; you must specify sample_tables in sample_name or derive them
khoroshevskyi commented 5 months ago

473

nsheff commented 5 months ago

That is not related. It is a red herring