Closed Hego-CCTB closed 2 years ago
amalgkit csca
amalgkit csca \
--out_dir \
PATH_TO_WORKING_DIRECTORY \
--file_species_tree \
PATH_TO_NWK_FILE \
--file_singlecopy \
PATH_TO_ORTHOFINDER_FILE \
--file_orthogroup \
PATH_TO_ORTHOFINDER_FILE \
--dir_uncorrected_curate_group_mean \
PATH_TO_CURATE_TABLES\
--dir_curate_group_mean\
PATH_TO_CURATE_TABLES\
--dir_sra \
PATH_TO_CURATE_TABLES\
--dir_tc \
PATH_TO_CURATE_TABLES\
--curate_group \ 'root,flower,leaf' \
- Note: This was tested on a 9 species plant dataset retrieved, quantified and curated by `amalgkit`. That said, further testing is needed. Especially gene name format can cause issues.
- Note: `dir_uncorrected_curate_group_mean`, `dir_curate_group_mean`, `dir_sra`, `dir_tc` all point to the same directory, if the input is unchanged `curate` output. As such, these arguments are `inferred` by default. If there is a `curate/tables` folder in the `--out_dir` path, amalgkit will find those files on its own.
## `amalgkit curate`
- Now throws a warning when transforming with TPM
- Now throws an error when `cstmm` output files are detected (parsed from path) in combination with TPM transformation
- Now includes option `--one_outlier_per_iter yes|no`, which allows only 1 sample per same bioproject or same tissue to be removed per iteration of the outlier removal
- `check_within_tissue_correlation()` now removes samples below a pearson r of 0.2 (currently hard coded, but can be made an optional input in the future)
- `--cleanup 0|1` is now `plot_intermediate yes|no`. "yes" calculates and prints SVA correction after every single iteration of outlier removal. This can drastically increase runtimes.
## `amalgkit getfastq`
- truncated updated_metadata output files to only essential columns for `curate`. This comes with two benefits: lower filesize (which very slightly increases `curate` performance) and more importantly, same column number across all individual files
- obsoleted `--ascp` and all related options
## `amalgkit`
- added `amalgkit csca` subparsers
This should go up later today. I'm still debugging and I have to merge with the other updates today.
Is there any option like --curate_group all
to include all curate_group in the metadata table?
If --curate_group
is left none
, it should parse out all unique values from the curate_group column and use that as input.
Sounds good!
The curate_group
column is missing in the metadata table. Could you update amalgkit metadata
?
Ah, it seems the column doesn't survive the last metadata step. There are 3 metadata sheets as output. curate_group
is in the second output, but not in the third.
I'll investigate that.
It seems that curate_group
isn't used at all in transcriptome_curation.r
. Am I missing something?
Yeah, you are right. I'm gonna need to replace any reference to tissue with curate_group.
Yeah, you are right. I'm gonna need to replace any reference to tissue with curate_group.
This is going to be a bigger update, affecting multiple currently open issues. So I'll post the changelog in here and refer to this from the other issues.