pepkit / looper

A job submitter for Portable Encapsulated Projects
http://looper.databio.org
BSD 2-Clause "Simplified" License
20 stars 7 forks source link

looper isn't working with sample_table_index #458

Closed nsheff closed 1 month ago

nsheff commented 4 months ago

Using a basic project with a sample table index fails, due to this line:

https://github.com/pepkit/looper/blob/1468956dde66abf5b853c80eaeaee2d411bfad64/looper/looper.py#L458

it's expecting there to be a sample_name column, which there is not, since I'm using sample_table_index.

This makes looper not fully compliant with PEP.

donaldcampbelljr commented 4 months ago

I believe this is a Peppy issue: https://github.com/pepkit/peppy/issues/459

nsheff commented 4 months ago

I don't think it's a peppy issue -- it's looper code that is referring to sample.sample_name

are you suggesting that peppy make available a .sample_name attribute to refer to wherever the index is?

donaldcampbelljr commented 4 months ago

I don't think it's a peppy issue -- it's looper code that is referring to sample.sample_name

are you suggesting that peppy make available a .sample_name attribute to refer to wherever the index is?

Yes. I thought that was the original functionality.

Some digging, for reference to this issue: https://github.com/pepkit/peppy/commit/3d5495b0241a117cdfb07ee60e15c6a3e2a516eb

donaldcampbelljr commented 3 months ago

I added a commit here where I made a test to help troubleshoot this issue further: https://github.com/pepkit/looper/commit/8fde43a61d5042c447a6927325d91bb152e27e7e

Using a sample_table_index in the project config, I also replaced sample.sample_name in the Looper code with sample[self.prj.sample_table_index]. This was done based on previous, similar work that may not have been completed: https://github.com/pepkit/looper/commit/9ce510d4130dfcd4f0b7b04f32e6a9872cb154bd

However, I now see an Eido validation error during the test where I simply attempt to use looper run

Error: EidoValidationError (Validation failed): {"'sample_name' is a required property": [{'type': "'sample_name' is a required property", 'message': "'sample_name' is a required property on instance project", 'sample_name': 'project'}]}
["'sample_name' is a required property"]

Related test data:

project config:

name: looper_advanced_test
pep_version: 2.0.0
sample_table: annotation_sheet.csv
sample_modifiers:
  append:
    attr: val
  derive:
    attributes:
    - read1
    - read2
    sources:
      SRA_1: '{SRR}_1.fastq.gz'
      SRA_2: '{SRR}_2.fastq.gz'
sample_table_index: NOT_SAMPLE_NAME

annotation sheet <!DOCTYPE html>

NOT_SAMPLE_NAME | protocol | data_source | SRR | Sample_geo_accession | read1 | read2 -- | -- | -- | -- | -- | -- | -- sample1 | PROTO1 | SRA | SRR5210416 | GSM2471255 | SRA_1 | SRA_2 sample2 | PROTO1 | SRA | SRR5210450 | GSM2471300 | SRA_1 | SRA_2 sample3 | PROTO2 | SRA | SRR5210398 | GSM2471249 | SRA_1 | SRA_2
donaldcampbelljr commented 3 months ago

Future discussion to happen in: https://github.com/pepkit/peppy/issues/459

However, will keep this open to track for Looper.

donaldcampbelljr commented 1 month ago

Ok, I believe this PR in Peppy (https://github.com/pepkit/peppy/pull/484) will solve this issue and the corresponding issue: https://github.com/pepkit/peppy/issues/459