welch-lab / liger

R package for integrating and analyzing multiple single-cell datasets
GNU General Public License v3.0
381 stars 78 forks source link

U algorithm #221

Closed cvanderaa closed 3 years ago

cvanderaa commented 3 years ago

Hello @Akriebs !

Thank you very much for the amazing UINMF algorithm. I really enjoyed reading the preprint!

I found a very small bug that generates an error when one of the modalities has no unshared features. I here provide the fix that helped me run the code on my data. I thought this might be useful to you too. The fix accounts for the fact that is.null(logical(0)) returns FALSE. Using length(...) == 0 fill return TRUE for both NULL and logical(0)

Good luck with the submission process!

Best, Chris

Akriebs commented 3 years ago

Good morning Chris, Thank you very much for bringing this to our attention, and for your suggested solution. While I plan on merging your suggestion, I was hoping you might share your code or a toy example of your preprocessing steps. On my end, I have gotten it to run on datasets with only one set of shared features. Are you listing both datasets as having the potential for unshared features, and only one of them does? I just want to be able to update the tutorial for greater clarity if there's any issues there, and ensure that this won't be an issue for future users. Many thanks, April

cvanderaa commented 3 years ago

Hello April,

I'm working on a vignette to showcase how to apply integration methods to multi-modal datasets. I can surely share it with you once I'm done.

I'm integrating 2 datasets: scRNA-Seq and mass-spec based sc proteomics (MS-SCP). Yes, I was listing both datasets as having the potential for unshared features, but the selectGene could only select unshared features for one of the two. I didn't realize it immediately, but my second dataset (MS-SCP) does not fulfill the assumption that the feature expression approximately follows a Poisson distribution. And so I continued my analysis with unshared features only for the scRNA-Seq data (which needed the fix suggested above).

I must admit I don't like the feature selection procedure provided by liger. First, because I found the R documentation rather poor. Second, because I think it lacks flexibility. Therefore, I now perform the features selection outside of liger and provide the shared and unshared features directly through the var.genes and var.unshared.features slots of the liger object.

Akriebs commented 3 years ago

Hi Chris, If you are using only one of the datasets for gene selection, did you try entering only that dataset in as an unshared dataset, i.e unshared.datasets = list(1), if that was the dataset you wanted to use? and then if you wanted to use both unshared.datasets = list(1,2)? I apologize that your user experience was sub-optimal, and I hope to improve the documentation to improve this per your recommendations. Also, for the selectGenes function lacking flexibility, what are some of the changes that could be made to make it more diverse and applicable to your particular application (i.e. would you recommend making it amenable to more methods of feature selection to accommodate for different types of features, etc)?

cvanderaa commented 3 years ago

Hi April,

Thanks for your consideration! I hope my comments are not causing you too much trouble.

In the initial case, I have been using unshared.datasets = list(1,2). But maybe I could have avoided the error with only list(1)?

I think that more flexibility could be added by providing functions to set var.genes and var.unshared.features directly (with validity checks to make sure the user is doing the right thing). With directly I mean without needing to call to selectGenes which, as far as I understand, is more suited for scRNA-Seq data. This is how I now use it in the end, but it took me some time to dive into the code and understand which slots I needed to adapt. Like this, you offer the user the ability to use their own selection method.

Of course, that's my opinion and works best in my case. I won't dare to claim that's the best thing to do ;-)