stemangiola / tidySingleCellExperiment

Brings SingleCellExperiment objects to the tidyverse
https://stemangiola.github.io/tidySingleCellExperiment/index.html
35 stars 10 forks source link

revise unit tests `tidySingleCellExperiment` #79

Closed HelenaLC closed 1 year ago

HelenaLC commented 1 year ago
stemangiola commented 1 year ago
  • revised tests (including plotting) bringing us to ~90% coverage
  • tests are a little more complex/data-driven now (e.g., no hard coded numbers, in case we change the example data)
  • also fixed a potential bug in rename() (renaming special columns, e.g., .cell, previously gave an error that the column doesn't exists, but should work as intended now)
  • also updated the example data (pbmc_small) via BiocGenerics::updateObject()) as it was outdated, causing some tests to fail (due to differences in internal metadata)

Thanks @HelenaLC ! That's great. I left 3 comments with some little changes or explanations.

Love the PR.

HelenaLC commented 1 year ago

Also, sorry to be a pain. Since we are changing the test already, could I please ask to use base-R pipe?

I am trying to get rid of dplyr pipe wherever I can.

Yes, absolutely! I actually prefer not using any pipes internally, but didn't want to fiddle too much with the style. But agreed, I'll use base R pipes instead of %>%.

(Side note: perhaps we could have a brief chat re linting. I am mainly focusing on Bioc-guidelines plus some simplifications, and I am leaning towards more base R vs. tidy-style, which I think would make things cleaner (and briefer) in many (not all) cases, at least internally...)

stemangiola commented 1 year ago

Also, sorry to be a pain. Since we are changing the test already, could I please ask to use base-R pipe? I am trying to get rid of dplyr pipe wherever I can.

Yes, absolutely! I actually prefer not using any pipes internally, but didn't want to fiddle too much with the style. But agreed, I'll use base R pipes instead of %>%.

Base pipes after R evaluation they disappear. It is just a visual-coding aid. So makes no difference for R.

(Side note: perhaps we could have a brief chat re linting. I am mainly focusing on Bioc-guidelines plus some simplifications, and I am leaning towards more base R vs. tidy-style, which I think would make things cleaner (and briefer) in many (not all) cases, at least internally...)

Yes thta's a more complex topic. We could discuss about that. It is good to respect standards, but we are practically contributing to changing them. And it is becoming evident to many. When I think about standards, I try to imagine what they will be in 5-10 years. Things are moving fast :)

Personal opinion: I think pipes make the code more readable in virtually all cases.

HelenaLC commented 1 year ago

Also, sorry to be a pain. Since we are changing the test already, could I please ask to use base-R pipe? I am trying to get rid of dplyr pipe wherever I can.

Yes, absolutely! I actually prefer not using any pipes internally, but didn't want to fiddle too much with the style. But agreed, I'll use base R pipes instead of %>%.

Base pipes after R evaluation they disappear. It is just a visual-coding aid. So makes no difference for R.

(Side note: perhaps we could have a brief chat re linting. I am mainly focusing on Bioc-guidelines plus some simplifications, and I am leaning towards more base R vs. tidy-style, which I think would make things cleaner (and briefer) in many (not all) cases, at least internally...)

Yes thta's a more complex topic. We could discuss about that. It is good to respect standards, but we are practically contributing to changing them. And it is becoming evident to many. When I think about standards, I try to imagine what they will be in 5-10 years. Things are moving fast :)

Personal opinion: I think pipes make the code more readable in virtually all cases.

Absolutely. I did not mean to speak against pipes in general (illusional or not). Currently, I am trying my best to keep them for workflow-like operations, where they do make things easier to write and read. And only not use them when it really is a lot easier to follow (in my opinion), e.g., I struggled more to understand things like ... |> colnames() |> duplicated() |> which() |> length() |> gt(0) or ... |> duplicated() |> any() |> '!'() as opposed to !any(duplicated(colnames(...))), which literally reads "not any duplicated colnames". All that to say: Yes, I agree that they are easier to read in many cases, but perhaps not all (and also more cumbersome to debug at times...).

Independent of personal opinions / style-preferences, my case against using dplyr/tidyr inside packages (unless it does make things a LOT easier) is mainly because they are quite volatile at times. I have had to make quite frequent packages updates because of some syntax changes, deprecations etc. throwing warnings. So find it best to avoid them for simple things when possible without much magic.

Aaanyways, hoping to finish a first iteration first (towards a clean-ish BiocCheck) and we can see from there and change things that are not appreciated. Like I said, I am not against pipes (use them a lot!) and am keeping them in the vast majoirty of cases (like ~90%). Mainly trying to assure consistent use of <-/=, spacing, line-width, not using @ for accession, etc.

*One example re the above, when() has been deprecated in favor of base R if (...) ... else(...) ... :/

stemangiola commented 1 year ago

is mainly because they are quite volatile at times. I have had to make quite frequent packages updates because of some syntax changes, deprecations etc

true

HelenaLC commented 1 year ago

Just wondering before I finalize the next one, what's happening with this PR? Is there something you'd like to be changed explicitly (and perhaps kept for another PR) before merging?

stemangiola commented 1 year ago

I see.

So .cell is an immutable column.

A solution can be

  • colData() |> as_tibble(rownames = EXISTING_INTERNAL_FUNCTION_THAT_ENCODES_CELL_COLNAME)
  • rename
  • drop .cell column so it's untouched
  • warning ("tidySingleCellExperiment says: ...") If .cell is in the query (there could be multiple columns in the query)

Hello @HelenaLC, is the above still pending? I think this is the last part missing, to be modified fro your PR. All the rest look fantastic.

HelenaLC commented 1 year ago

Alright. I implemented a fix that I think does what it should, i.e., special columns (.cell, PCs etc.) cannot be renamed NOR be used as target names; it is a little more complicated than what was there before, but does the trick.

stemangiola commented 1 year ago

Alright. I implemented a fix that I think does what it should, i.e., special columns (.cell, PCs etc.) cannot be renamed NOR be used as target names; it is a little more complicated than what was there before, but does the trick.

Great, thanks for that! We were not getting the expected error, so I slightly changed the logic.

Also, in the future, we can check expect_error specifying the expected error message.

Thanks for this PR!