lcdb / lcdb-wf

Robust, tested workflows for RNA-seq, ChIP-seq and other high-throughput sequencing analysis
https://lcdb.github.io/lcdb-wf
20 stars 17 forks source link

Layout rather than LibraryLayout #407

Open daler opened 1 month ago

daler commented 1 month ago

Recently it seems that the SRA sampletables downloaded from SRA use Layout rather than LibraryLayout as a column name. This breaks the SRA detection. Confirm this is the case, and make the SRA detection more robust, for example by looking for either of those column names.

menoldmt commented 1 month ago

I can confirm this is an issue. I was only able to run ./run_complex_test.sh -j16 after removing the LibraryLayout column in ../../test/test_configs/complex-dataset-rnaseq-sampletable.tsv and replacing it with layout with SE in all rows.

menoldmt commented 1 month ago

My first error message after running ./run_complex_test.sh -j16 without editing the sampletable:

ValueError in file /gpfs/gsfs10/users/NICHD-core0/test/menoldmt/expression-mod/workflows/rnaseq/Snakefile, line 30:
Sampletable appears to be SRA, but no 'Layout' column found. This is required to specify single- or paired-end libraries.
  File "/gpfs/gsfs10/users/NICHD-core0/test/menoldmt/expression-mod/workflows/rnaseq/Snakefile", line 30, in <module>
  File "/gpfs/gsfs10/users/NICHD-core0/test/menoldmt/expression-mod/workflows/rnaseq/../../lib/patterns_targets.py", line 106, in __init__
  File "/gpfs/gsfs10/users/NICHD-core0/test/menoldmt/expression-mod/workflows/rnaseq/../../lib/patterns_targets.py", line 78, in __init__
  File "/gpfs/gsfs10/users/NICHD-core0/test/menoldmt/expression-mod/workflows/rnaseq/../../lib/helpers.py", line 20, in detect_layout
  File "/gpfs/gsfs10/users/NICHD-core0/test/menoldmt/expression-mod/workflows/rnaseq/../../lib/helpers.py", line 20, in <listcomp>
  File "/gpfs/gsfs10/users/NICHD-core0/test/menoldmt/expression-mod/workflows/rnaseq/../../lib/common.py", line 732, in is_paired_end

my second error message after running ./run_complex_test.sh -j16 after adding layout (even though the initial error message had Layout):

ValueError in file /gpfs/gsfs10/users/NICHD-core0/test/menoldmt/expression-mod/workflows/rnaseq/Snakefile, line 30:
Expecting column 'layout' or 'LibraryLayout', not both
  File "/gpfs/gsfs10/users/NICHD-core0/test/menoldmt/expression-mod/workflows/rnaseq/Snakefile", line 30, in <module>
  File "/gpfs/gsfs10/users/NICHD-core0/test/menoldmt/expression-mod/workflows/rnaseq/../../lib/patterns_targets.py", line 106, in __init__
  File "/gpfs/gsfs10/users/NICHD-core0/test/menoldmt/expression-mod/workflows/rnaseq/../../lib/patterns_targets.py", line 78, in __init__
  File "/gpfs/gsfs10/users/NICHD-core0/test/menoldmt/expression-mod/workflows/rnaseq/../../lib/helpers.py", line 20, in detect_layout
  File "/gpfs/gsfs10/users/NICHD-core0/test/menoldmt/expression-mod/workflows/rnaseq/../../lib/helpers.py", line 20, in <listcomp>
  File "/gpfs/gsfs10/users/NICHD-core0/test/menoldmt/expression-mod/workflows/rnaseq/../../lib/common.py", line 741, in is_paired_end

To fix this, from here, I removed the original LayoutLibrary column leaving only the layout column with SE in all rows