pepkit / looper

A job submitter for Portable Encapsulated Projects
http://looper.databio.org
BSD 2-Clause "Simplified" License
20 stars 7 forks source link

New sample table index variables in config file #338

Closed nsheff closed 1 year ago

nsheff commented 2 years ago

@stolarczyk The changelog for 0.32.0 says:

- new project attributes:
  - `sample_table_index`
  - `subsample_table_index`

This isn't really documented anywhere, but I think these are intended to be parallel to sample_table and subsample_table in the PEP project_config specification, and define the column name to be used for the identifier attribute. Am I right?

This seems to be sort of working, but with looper I get this error:

looper run
No project config defined, using: /home/nsheff/code/pephub.databio.org/peps/nfcore/demo_rna/project_config.yaml. Read from dotfile (/home/nsheff/code/pephub.databio.org/peps/nfcore/demo_rna/.looper.yaml).
Looper version: 1.4.0
Command: run
Found 2 samples with non-unique names: {'RAP1_UNINDUCED_REP2', 'WT_REP1'}. Attempting to auto-merge.
Traceback (most recent call last):
  File "/home/nsheff/.local/lib/python3.8/site-packages/attmap/ordattmap.py", line 46, in __getitem__
    return super(OrdAttMap, self).__getitem__(item)
KeyError: 'sample_name'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/nsheff/.local/bin/looper", line 8, in <module>
    sys.exit(main())
  File "/home/nsheff/.local/lib/python3.8/site-packages/looper/looper.py", line 1065, in main
    p = Project(
  File "/home/nsheff/.local/lib/python3.8/site-packages/looper/project.py", line 106, in __init__
    self._samples_by_interface = self._samples_by_piface(self.piface_key)
  File "/home/nsheff/.local/lib/python3.8/site-packages/looper/project.py", line 690, in _samples_by_piface
    samples_by_piface[source].add(sample[SAMPLE_NAME_ATTR])
  File "/home/nsheff/.local/lib/python3.8/site-packages/attmap/pathex_attmap.py", line 59, in __getitem__
    v = super(PathExAttMap, self).__getitem__(item)
  File "/home/nsheff/.local/lib/python3.8/site-packages/attmap/ordattmap.py", line 48, in __getitem__
    return AttMap.__getitem__(self, item)
  File "/home/nsheff/.local/lib/python3.8/site-packages/attmap/attmap.py", line 32, in __getitem__
    return self.__dict__[item]
KeyError: 'sample_name'

I guess this is likely a looper bug, and not a peppy bug, right? It seems just that looper isn't adapted to use this new model?

Also this will, really, require an update to to the specification. For now we could mark it as experimental.

stolarczyk commented 2 years ago

TL;DR I think this can be addressed in looper. Just replace any SAMPLE_NAME_ATTR occurrences with Project.sample_table_index

I think I know what is going on here. This PEP likely uses the new functionality in PEP spec, which is custom sample table index (not sample_name). I think it was introduced in the latest, probably still unreleased, PEP spec -- when I go to "latest" docs version in PEP spec page, the "sample table index" docs are missing. They are available here: http://pep.databio.org/en/2.1.0/specification/#sample-table-specification.

So this is not a breaking change to the PEP spec. All PEPs that used sample_name will still work, as long as we update the software to use Project.sample_table_index property to determine which Sample attribute to use to refer to samples (we used sample_name up to this point).

nsheff commented 2 years ago

that makes perfect sense, thanks! Very helpful.

ayobi commented 1 year ago

Was able to replace SAMPLE_NAME_ATTR with Project.sample_table_index for a few like:

https://github.com/pepkit/looper/blob/df535be2a0bd7257ee37148cb601d688ec557909/looper/conductor.py#L54-L63

But for other's like the one below, I also receive a dictionary key error: https://github.com/pepkit/looper/blob/df535be2a0bd7257ee37148cb601d688ec557909/looper/conductor.py#L395-L397

looper run project/project_config.yaml Looper version: 1.4.0 Command: run Using default config. No config found in env var: ['DIVCFG'] Pipestat compatible: False Traceback (most recent call last): File "/home/aaronobrien/projects/looper/.venv/lib/python3.10/site-packages/attmap/ordattmap.py", line 46, in __getitem__ return super(OrdAttMap, self).__getitem__(item) KeyError: <property object at 0x7fe62ae222a0>

I'll keep digging to see why. Reading more about sample_table_index its supposed to default to sample_name so maybe what's happening is something in the code in passing in something different and therefore not recognizable.

https://github.com/pepkit/peppy/blob/8449635bb35e16748e693bdd2c4820502d7b448d/peppy/project.py#L1102-L1119

ayobi commented 1 year ago

I was able to figure it out. I will PR and commit for review.