Closed kfuku52 closed 3 years ago
Hm. I introduced two calls of unique()
to check for cases in which all samples of a tissue belong to the same bioproject. This call runs on the metadata table. I suspect whatever causes those may cause problems in the metadata, which results in the error.
Could you indicate where the unique calls you think suspicious are?
OK, maybe I found it. The if
condition you originally introduced looks like this:
https://github.com/kfuku52/amalgkit/blob/cbd6852060319083283ca9f062a106709c97e63d/amalgkit/transcriptome_curation.r#L275
I don't know why unique()
is required here, because sra2_run_other_bp
is a data.frame, not a vector. But more importantly, length()
returns the number of columns, not rows, for a data.frame, so it always returns a fixed number no matter how many SRA runs remain there. Could you tell me how you tested this code? If I can improve your coding environment, I'm happy to help.
I used browser()
to enter debug and see which part of the if condition the function enters when running through my dataset. I used my Helianthus dataset, where all flower samples came from the same bioproject.
I set my_tissue
to flower and ran it through browser, looked in which part of the if statement I end up and did the same with leaf.
For my testset this works, because sra2_run_other_bp[sra2_run_other_bp$tissue == my_tissue]
returns a dataframe with 0 columns, when my_tissue == flower
. This was certainly not how I inteded this to work, but I didn't notice, because it actually did what it was supposed to do.
Oh, and I just realized why it returns a dataset with 0 columns. It's because sra2_run_other_bp$tissue == my_tissue
returns enough FALSE
, so that sra2_run_other_bp[sra2_run_other_bp$tissue == my_tissue]
ends up empty.
In this particular case, I'd suggest first checking what's stored in sra2_run_other_bp$tissue == my_tissue
and then sra2_run_other_bp[sra2_run_other_bp$tissue == my_tissue]
(the error should be detected here) before passing them to other functions like length(unique(sra2_run_other_bp[sra2_run_other_bp$tissue == my_tissue]))
.
And of course, we need a good test dataset. https://github.com/kfuku52/amalgkit/issues/41
This error occurred in
curate
with a very large Apis dataset, but not in its subset. I'll investigate more next week.