Closed B0ydT closed 6 months ago
Does the function works with an arbitrary set of variables?
Does the function works with an arbitrary set of variables?
It does now!
I refactored the code in an attempt to fix the error. The function works when I run it myself and during unit tests but fails when I rebuild and run R CMD Check. It throws an error saying unite
doesn't like tidySCE objects, but I have explicitly converted the object to a tibble, so I don't understand why that is still happening.
Hello @B0ydT, any news about this PR? Let me know if you need help/more explanation.
ping
Thanks for your feedback. It was very clear and I addressed most of it. I'm not 100% sure where you want to use select, though.
I'm still getting an R CMD Check error for the call to unite
. The function works just fine, but generates an error when checked.
Edit: My example was pbmc_small |> group_split(pbmc_small, groups)
🤦♂️
I'm not sure it's exactly the method you had in mind, but I think I've fixed it. I totally forgot those logical statements in group_by
exist, as I never really use them myself.
I saw a lot of your other methods make use of the original dplyr functions. I had given up on group_split
, because I didn't see how a list of tbl
s would help me split the SCE object, but I just came across group_rows
, which allows you to extract the indices from a grouped tbl
!
It does not add those "PC_1>0" type columns with logical values yet, but I should be able to add those shortly.
Have added the tests.
I am trying to sidestep all of the name corrections so that the names of new columns are consistent with what you'd get from the dplyr
functions, i.e. groups=="g1"
. This is also necessary so that they can be dropped when .keep = FALSE
.
The closest I've gotten is
colData(.tbl) <- .tbl |>
colData() |>
as_tibble() |>
dplyr::mutate(!!!var_list) |>
DataFrame(check.names = FALSE)
I can use colData(.tbl)
to see that the names made it in unchanged, but any of the dplyr
methods for SCEs 'fix' the names. I could, of course, add the columns after the split, but I don't think anyone wants to see groups.....g1.
in their column names, and I'd still need to rethink the .keep
method. Of course, if the underlying package heavily relies on the assumption that names will all be correct
, then my ideal solution may not be viable.
Have added the tests.
I am trying to sidestep all of the name corrections so that the names of new columns are consistent with what you'd get from the
dplyr
functions, i.e.groups=="g1"
. This is also necessary so that they can be dropped when.keep = FALSE
.The closest I've gotten is
colData(.tbl) <- .tbl |> colData() |> as_tibble() |> dplyr::mutate(!!!var_list) |> DataFrame(check.names = FALSE)
I can use
colData(.tbl)
to see that the names made it in unchanged, but any of thedplyr
methods for SCEs 'fix' the names. I could, of course, add the columns after the split, but I don't think anyone wants to seegroups.....g1.
in their column names, and I'd still need to rethink the.keep
method. Of course, if the underlying package heavily relies on the assumption that names will all becorrect
, then my ideal solution may not be viable.
don't see the problem. this looks good to me
pbmc_small |> group_split(PC_1>0 & groups == "g2") %>% .[[1]] |> select(groups)
tidySingleCellExperiment says: Key columns are missing. A data frame is returned for independent data analysis.
# A tibble: 75 × 1
groups
<chr>
1 g2
2 g1
3 g2
4 g2
5 g2
6 g1
7 g1
8 g1
9 g1
10 g1
If it behaves well enough for the vast majority of use cases, I would say let's go with this, and we can improve it in the future.
It would be good to translate this to tidyseurat
, and the more complicated tidySummarizedExperiment
.
Congrats @B0ydT !
Let me know what you think about repurposing your PR.
I've got a
group_split
implementation working for row and column data. I can work ongroup_by
if you're happy with it. If not, let me know where it needs work.I was a bit stuck overthinking some decisions but just decided to go for it, so I'm very happy to change things up. For example, I wasn't sure if it would be preferable to specify grouping by row/column or autodetecting like I ended up doing.
Issue #71