saezlab / ccc_protocols

LIANA x Tensor-cell2cell Protocols
https://ccc-protocols.readthedocs.io
MIT License
1 stars 3 forks source link

Quick start code = manuscript code? #6

Closed earmingol closed 1 year ago

earmingol commented 1 year ago

Are we doing the quick start exactly the same as the code in the manuscript? If so, we should copy exactly the same code from the manuscript. Notice that I made some modification to the code in the manuscript and it's not exactly the same as in the notebooks we have here.

dbdimitrov commented 1 year ago

Yep, we should keep them consistent to minimise redundancies.

dbdimitrov commented 1 year ago

Maybe we should move the liana by sample part to an appendix? Seemed much more streamlined in the quickstart.

hmbaghdassarian commented 1 year ago

Ya it should be the same but approach wise:

I would say it should probably be finish extended tutorials --> adapt to quickstart tutorial --> add the specific code to the manuscript. I think a linear direction like this is the best way to get consistency, otherwise we're going back and forth between 3 different sources like Daniel said. Unsure how else to resolve having the three different versions.

I actually think the most efficient way for doing this would be to have fully finalized extended tutorials (or as close to it as possible) before even starting the quickstart, and same thing for quickstart --> manuscript code. Not that the actual text in the manuscript can't be the same or very close to what we have now, but the code parts would have to be modified.

earmingol commented 1 year ago

I would finish the manuscript code first, test it and generate quick start from it, because I have been shaping the code to make the flow of the manuscript better (e.g. removing some parameters and leave them just for the extended versions).

So basically the work order for me has been extended tutorials -> manuscript... so then we do manuscript -> quick start?

hmbaghdassarian commented 1 year ago

ya that makes sense too it's the same thing in that case. let's just make sure it's linear like that...i guess going back and forth is a bit unavoidable as we iteratively update things, but it's tough lol

hmbaghdassarian commented 1 year ago

Let me know what you think @earmingol of how to proceed:

I was just going to start modifying the python QuickStart (QS) to match the manuscript, but I noticed from the first few blocks of code that there are some inconsistencies (variable names, parameters values, etc) in the manuscript version compared to the extended tutorials (ET).

So either we should do the ET--> QS--> manuscript (the current QS version I implemented is code directly copied from ETs) or the manuscript code blocks need to be modified to match the ET (for ET --> manuscript --> QS). I think the latter option would probably be quickest since code blocks are already in the manuscript. I can modify those blocks in tracking mode if you agree. The only place this may be an issue is if certain discrepancies were deliberate for some reason rather than accidental oversights.

earmingol commented 1 year ago

What kind of inconsistencies you mean? I put some on purpose (for example omitting parameters since we are not explaining them in the manuscript)... so it does not matter that the quick start and the extended tutorial lead to different results.

Other things I did were creating new variables for passing the figure names for example, or changing some variable names (which I did not have time to change in the ET too) just to make the manuscript more clear, consistent within itself, and easier to follow. I would not worry of these differences since the idea of the quick start is to be as simple as possible to avoid distractions on minor details (eg parameters that are not critical when using the default values).

I would say the flow of ET -> Manuscript/QS is only for using the previous step as a starting point for the next step, but they do not need to match. With this I mean we took the ET to create the manuscript/qs code but we still can modify the latter to make it simpler and easier to follow.

For example a case I remember is manually defining the context_dict in the ET because in that notebook we don't load the rnaseq data, but in the manuscript it should be already loaded so it does not make sense manually defining the dict, so I directly create it from the adata.obs object.