kfuku52 / amalgkit

RNA-seq data amalgamation for a large-scale evolutionary transcriptomics
BSD 3-Clause "New" or "Revised" License
7 stars 1 forks source link

--batch base #51

Closed kfuku52 closed 2 years ago

kfuku52 commented 3 years ago

Currently, --batch takes zero-based index, so, on SGE for example, array jobs should be submitted like --batch $[${SGE_TASK_ID}-1]. I'm thinking if it would be better if the index starts from one, not zero. The one-based index is also easy to locate data when browsing the metadata table with Excel. @Hego-CCTB If there is no issue, I'd like to change it.

Hego-CCTB commented 3 years ago

sure! no problem w/ me

kfuku52 commented 3 years ago

Applied the change to getfastq and quant in https://github.com/kfuku52/amalgkit/commit/62ac91b798af63055cee842bbb3e44bc4f8d8cd8

@Hego-CCTB Could you take care of --batch in curate? Currently --batch values do not appear to be used at all except for a few check with is not None. As we discussed previously, curate should choose species and detect all necessary input automatically according to --batch.

Hego-CCTB commented 3 years ago

sure, can do!

kfuku52 commented 3 years ago

related bug fix https://github.com/kfuku52/amalgkit/commit/eeac07694a5d6ad775cb20617e06c40e3d6bf89d

kfuku52 commented 2 years ago

@Hego-CCTB bump

Hego-CCTB commented 2 years ago

I will take care of this soon. I have tons of data lined up for amalgkit curate which I'd like to run --batch on.

Hego-CCTB commented 2 years ago

Added this here, but never made the closing report: https://github.com/kfuku52/amalgkit/commit/b397f1a9b9661d9cedf53a0bc201acc44a73df85

amalgkit curate --batch works similar to other batch commands for amalgkit, just on species basis, rather than sample basis. i.e. if you have 3 species in your metadata, amalgkit curate --batch 2 will try to process the 2nd species in the metadata sheet (in order of first occurrence).