pepkit / peppy

Project metadata manager for PEPs in Python
https://pep.databio.org/peppy
BSD 2-Clause "Simplified" License
37 stars 12 forks source link

the project missing `sample_name`, but with `sample_table_index` set correctly, cannot run in looper #459

Closed donaldcampbelljr closed 1 month ago

donaldcampbelljr commented 7 months ago

Unclear if this is actually a looper issue

khoroshevskyi commented 7 months ago

I just checked all peppy tests related to this issue, and it seems, this is not a peppy issue...

nsheff commented 7 months ago

Is there some way of getting the sample's index (whether it be from sample_name or some custom attribute) from a peppy sample object?

khoroshevskyi commented 7 months ago
donaldcampbelljr commented 7 months ago

Traceback from Looper:

/home/drc/GITHUB/looper/master/looper/venv/bin/python -m looper run --looper-config .looper.yaml 
Looper version: 1.6.0a2
Command: run
Using default divvy config. You may specify in env var: ['DIVCFG']

Pipestat compatible: False
Traceback (most recent call last):
  File "/home/drc/GITHUB/looper/master/looper/venv/lib/python3.10/site-packages/peppy/simple_attr_map.py", line 45, in __getattr__
    return self._mapped_attr[item]
KeyError: 'sample_name'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/drc/GITHUB/looper/master/looper/looper/__main__.py", line 8, in <module>
    sys.exit(main())
  File "/home/drc/GITHUB/looper/master/looper/looper/cli_looper.py", line 715, in main
    return run(args, rerun=(args.command == "rerun"), **compute_kwargs)
  File "/home/drc/GITHUB/looper/master/looper/looper/looper.py", line 457, in __call__
    name=sample.sample_name,
  File "/home/drc/GITHUB/looper/master/looper/venv/lib/python3.10/site-packages/peppy/simple_attr_map.py", line 47, in __getattr__
    raise AttributeError(f"Attribute not found: {item}")
AttributeError: Attribute not found: sample_name

Process finished with exit code 1

This is if I attempt to use project config:

pep_version: 2.0.0
sample_table: sample_annotation.csv
sample_table_index: not_sample_name

with sample annotation:

not_sample_name,library,file,toggle
frog_1,anySampleType,data/frog1_data.txt,1
frog_2,anySampleType,data/frog2_data.txt,1
donaldcampbelljr commented 6 months ago

image

donaldcampbelljr commented 6 months ago

Confirmed that the issue is here in Peppy: https://github.com/pepkit/peppy/blob/c0affde7f93ef4cb084aeec25a8d82155d35299c/peppy/project.py#L641-L661

khoroshevskyi commented 5 months ago

There is a reason why we don't want to do that. This change was made in this commit: https://github.com/pepkit/peppy/commit/3d5495b0241a117cdfb07ee60e15c6a3e2a516eb

So I think this is correct behaviour

donaldcampbelljr commented 4 months ago

After discussion, it was determined that this is something that should be solved within Peppy. We will return to using sample.sample_name as a way to easily identify the sample identifier regardless of whether or not the user provided a different identifier via sample_table_index. Generally, we want the user to call items via bracket notation sample[sample_name] (since attmap has been removed). However, sample_name will be elevated such that the user can easily get the sample identifier via attribute access, e.g. sample.sample_name.

This fix should happen in the above previously referenced Peppy code, and Looper changes should be reverted to access the same.sample_name attribute.