sachsURAP / LSSR-2019

Data and code for the Life Sciences in Space Research 2019 paper. Concerns modeling murine Harderian gland tumorigenesis induced by mixed radiation fields.
GNU General Public License v3.0
1 stars 1 forks source link

Meetings, assignments, verbal discusions and questions, including those about math and programming #3

Closed rainersachs closed 6 years ago

rainersachs commented 6 years ago

I suggest we consider every file except today's upload by me and the Rmd file as obsolescent and try to transfer their information into the file I just uploaded. I forgot to ask Edward how to merge files within GitHub. I have done some of that (painfully) with R-Studio. Once we are sure a particular other file is obsolescent, let's rename it to contain the word OBSOLETE. I think maybe we can already do it to every file except 2: the one I just committed that ends in GH.R (for GitHub) and Mark's file. Mark: please change your filename to something more informative ending in GH.R, e.g. OurIDERs_vs.2017ccHazardGH.R or something.

rainersachs commented 6 years ago

Edward: Thanks for the general style comments in HGSynergyMain.R. I only just found them and will read them soon. Can you open an issue which contains advice, move the style comments there, add the comment you made today about the command to freeze everything above a buggy line while you play with the line, and periodically update the advice. As best I can tell, if we can somehow merge the information in HGSynergyMain.R plus the two files I just renamed to end in GH.R we are almost done except for 95% CI on I(d). But merging without losing information or introducing bugs looks like it could be a very nasty job to me. Do you know how to merge within GitHub? If so I think the best way might be for the two of us to do it together in a long session. Otherwise it would, I think, take either one of us alone something like 20 hours to do it.

Edward: please close the other two issues unless you have reason to keep them open.

eghuang commented 6 years ago

Ray: I think we may have to do this manually. There are two cases of easy merges that I know of:

  1. File A and file B have code that are mutually irrelevant, i.e. nothing in either script affects the behavior of the other. We can just copy and paste the whole script in this case.
  2. File B is an updated version of file A, i.e. file A's script is obsolete. We can just copy B over A or delete A.

Our merge obviously falls into neither of these categories and I am not aware of a way for Github to know what script we wish to keep or discard between our files. I will try merging the two today and tomorrow, and if that does not work then we can work on it together.

rainersachs commented 6 years ago

thanks. Please write some plots for each file and look at the plots before merging so you haveA sense of what each does. and thanks for the style guide.

On Tue, Aug 22, 2017 at 3:52 PM, Edward Huang notifications@github.com wrote:

Ray: I think we may have to do this manually. There are two cases of easy merges that I know of:

  1. File A and file B have code that are mutually irrelevant, i.e. nothing in either script affects the behavior of the other. We can just copy and paste the whole script in this case.
  2. File B is an updated version of file A, i.e. file A's script is obsolete. We can just copy B over A or delete A.

Our merge obviously falls into neither of these categories and I am not aware of a way for Github to know what script we wish to keep or discard between our files. I will try merging the two today and tomorrow, and if that does not work then we can work on it together.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Edwardghg/NASAmouseHG/issues/3#issuecomment-324173472, or mute the thread https://github.com/notifications/unsubscribe-auth/AQ4462Va0FVj00RZkaKHgK2MTpsW1Mtzks5sa1usgaJpZM4O3Fss .

eghuang commented 6 years ago

I've merged the scripts into one file, HGsynergyMain_merge.R, such that there are no errors in running the code and the plots are the same as they were when the code was in separate files. I've also removed redundant code between the files and left comments with the tag #egh where values differ between files (e.g. phi <- 3e3 #egh phi <- 1000 in HGsynergyHZE_GH.R).

rainersachs commented 6 years ago

super! I'll look tonite!

On Wed, Aug 23, 2017 at 5:02 PM, Edward Huang notifications@github.com wrote:

I've merged the scripts into one file, HGsynergyMain_merge.R, such that there are no errors in running the code and the plots are the same as they were when the code was in separate files.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Edwardghg/NASAmouseHG/issues/3#issuecomment-324495337, or mute the thread https://github.com/notifications/unsubscribe-auth/AQ4464LRbom8HxcxBxZS2JpJ7y-k17vOks5sbL2DgaJpZM4O3Fss .

rainersachs commented 6 years ago

Edward (and Mark). HGsynergyMain_merge.R runs nicely on my machine. Commenting is really clear. Thanks.

I will next work on trying to eliminate redundancies and obsolete parts from that script. For example I think we ended up with 2 inconsistent models for the fast light ions, with the one on lines 134- 144 and line 195 obsolete (though functional). And my comments need to be brought up to date, not to mention needing to try to follow your style tips.

I suggest you and Mark next work on the last unexplored and hardest part: Monte-Carlo simulations for MIXDER 95% CI using variance-covariance matrices. Assume a mixture of HZE only (no light, fast, low-LET ions until I have cleaned them up a little and eliminated the obsolete version, and you guys or I have solved the bugs mentioned in the comment on line 293 for the version that is not obsolete).

Mark (and Edward). I think we may be able to use the theory of functions of a complex variable in a non-trivial way in connection with defining an IDER with an ODE initial value problem dE/dd=F(E) and E(0)=0, where F(z) is a function, of a complex variable z=x+iy=E+iy, that has no singularities on the non-negative part of of the real axis and the restriction to that domain is real (e.g. any polynomial function of z with real coefficients). The idea is to see if there are relations between the locations of the zeros and behavior of the IDER. For example F=1+z^2 has zeros only on the y axis and the resulting IDER has the unpleasant property that E reaches + infinity at a finite value of d =pi/2; is that just a coincidence? In the 19th century people got a lot of mileage out of looking for the location of zeros (and, of course, for the location of singularities). I'll write this up for you guys if I ever have time.

rainersachs commented 6 years ago

The correction needed for beta and lambda was in line 231, not line 247. I think the file is OK now.

rainersachs commented 6 years ago

I would like to call a meeting. I am available most times, 7 days a week except Saturdays AM and Monday Sept. 11 PM. Mark and Edward please agree on a time and let me know.

I tentatively decided on a low LET model, and just uploaded a file (merge2) which I think has everything we have done to date and no redundancies, assuming that low LET model. However to make a final decision on the low LET model I need Mark to calculate information coefficients (Akaike and Bayesian) and compare with 17Cuc. I think Edward should work on Monte Carlo calculations of 95% confidence intervals for HZE MIXDERs . These will be useful even if we later change the low LET model because our HZE model is already decided. They will also act as templates for calculating more general MIXDERs. They will also insure Edward gets involved in very specific details of our particular calculations.

eghuang commented 6 years ago

I contacted Mark yesterday and will let you know the meeting time as soon as I can.

eghuang commented 6 years ago

How about next Wednesday, 2pm at Strada?

rainersachs commented 6 years ago

Wed 2 at Strada is fine.

Mark: Are you getting these issues messages? Do you plan to sign your learning contract?

rainersachs commented 6 years ago

Hi:

We meet Wed. the 6th 2 PM at the Strada. I think we better meet weekly during the semester to become more focused, so please bring your schedules

Mark: please come prepared to report how far you have gotten on the following assignment which we discussed earlier and try to ask enough questions to make sure you know how to carry out the assignment within the next few weeks. The assignment is the following: Study the theory behind and implementation of calculating information criteria (ICs), especially Akaike and Bayesian. Calculate them for our low LET IDER in merge2 on Edward’s web site. Calculate them for the NTE1 and NTE2 low LET IDERs in 17Cuc the same way (hold background and alpha_lambda fixed; make the IDERs be zero for dose 0; calibrate their 2 parameters using only the non-zero dose data). Compare. Once this is done you will be asked to repeat the calculation for HZE IDERs. About midway in the semester I will ask you to give a talk to our HG group on the theory of IC. If you are at a loss to decipher the assignment as stated, please try to review the relevant terminology and relevant lines in merge2. In addition, if you haven’t changed your mind, please read and sign your learning contract.

Mark and Edward: I had a very strong new URAP applicant, Yimin Lin, for this semester. He will be in I decided to put him in charge of the theory and implementation of error analysis based on Monte Carlo sampling of variance-covariance matrices, which we eventually need for 95% I(d) CI. He will also report to us on the theory during the semester.

Edward: I would like you to be in charge of quality control and testing of the programs during the semester and eventually report to us on that. Also let’s take a chance that Mark’ IC answers will not be unfavorable. So please try to extend merge2 to the case of a mixture involving N≥1 HZE and one low LET ion, using the IDERs in merge 2.

See you guys Wed! Ray

rainersachs commented 6 years ago

Minutes of meeting 9/6/17. Mark Ebert, Edward Huang, and Ray Sachs met at the Cafe Strada for an hour. We agreed on the following plan for the semester. All 4 of us will work on a script that will be able to apply synergy theory to the new Harderian gland (HG) mixed GCR radiation field data that will be available in some months, 18 months after the actual experiments due to tumorigenesis lag time. Concurrently, Mark will be in charge of breeding information coefficients (IC) and eventually explaining them to our pod or the whole URAP class, Yinmin will be in charge of breeding our variance-covariance matrices in R, caring for them, feeding them, and showcasing them. Edward will be in charge of debugging, testing and quality control of our program(s). I reiterated that if time permits there are many additional instructive and useful calculations and ideas to pursue, which are, however, of lower priority. We decided the three of us will meet again 11:20 Thursday the 14th at the Strada. We are hoping that we can find times bunched in such a way that I can meet with each of you, including Yinmin, individually for a half hour or so and we can also have shorter 4-way or at least 3-way meetings and student pairwise meetings on the same day. If Yinmin can make it Thursday mornings that will work. If we cannot find times then I will continue to meet Edward and Mark Thursday mornings weekly, meet Yinmin Fridays at 2:30, and we will arrange occaisional 4-way meetings at other times. Edward has locations where 4 way meetings are convenient and I suggest from now on he be responsible for all meeting organization. Edward explained some R commands, notably browser(). He will continue to add to the other issue, on style. We discussed what Mark needs to do to find the IC of immediate interest. Please make sure Yinmin and Mark have access to these issues and get notifications when comments are added. Any material I have that is of interest to our entire 4-pod will only appear on this repository from now on. Please post any additions or corrections to these minutes here.

Thanks!

rainersachs commented 6 years ago

In my minutes of Wed. 9/6/17 meeting I forgot to add an additional assignment that Edward and I agreed on. At the moment merge2 has a function to calculate baselines for mixtures of any number N of HZE and one to calculate baselines for a mixture of one HZE with one low LET ion. The latter should be extended to mixtures with N HZE ions and one low LET ions. Maybe we only need one R function to calculate simple and incremental effect additivity baselines for N>=1 HZE and either 0 or 1 low LET ion.

rainersachs commented 6 years ago

Yimin and I met Friday. His main programming assignment for the semester is writing code to calculate 95% CI for I(d) baseline MIXDERS. He will start with the three-ion HZE mixture defined in line 77 ff. of the HGsynergy_merge2.R code. For that mixture he will use the variance-covariance matrix determined by nls( ) regression in the code. He will use appropriate R functions that are already in the relevant packages. During the first few weeks he will be mainly concerned with getting the calculation for this specific example to work. Later in the semester he will generalize the calculation and also go more deeply into understanding the math/stat behind the packaged R functions.

Some logistic items that resulted from the meeting are the following. Weekly 4-way meetings are not feasible. Yimin and I will meet Fridays around 2 in my office. Edward, Mark, and I will meet Thursday mornings. In addition we will arrange at least one 4 way meeting sometime within the next month and one 6-way meeting with the other pod sometime during the semester. Yimin and Edward will both be working on HGsynergy_merge2.R; please coordinate, e.g. by using GitHub pull requests. My phone numbers are: 510-658-5790 for most times; 510-206-7483 only when I am already on the way to one of our meetings.

I think the project is moving forward well. Thanks to all 3 of you.

rainersachs commented 6 years ago

Yimin: See you tomorrow 2:30, my office. I downloaded improved versions of the previous paper and of a nice improvement edward made on the code. But if you prefer you can keep working on the earlier code and reading the earlier paper -- once we have one Monte Carlo CI estimate, we will be able to generalize pretty easily I think.

yiminllin commented 6 years ago

Hi guys, I just pushed the confidence interval code to the branch "ConfidenceInterval". The only change I made to the original code is adding new code, commenting out the plotting code and arrange obsolete file to a folder. I also add some comments for reading. If the code works well I will merge it to the main branch. This is just the first version so if there is any issue please tell me. I will try to implement naive version of calculating CI by next week. Have a nice weekend.

rainersachs commented 6 years ago

errorMessages.docx This .docx file mentions some issues with Yimin's confidence interval code. In brief, it becomes abnormally slow, so slow it stopped altogether after 87 Monte arloand it has problems with dose intervals being "too small". However it did produce a graph which looked plausible

Also while stumbling around I made a superfluous branch for this repository. Yimin please delete

yiminllin commented 6 years ago

Hi Professor, stopping after 87 iterations is expected becasue 87 iterations means 87 dose points rather than 87 monte carlo, which means we have made 87*500 monte carlo sampling. For the step size it also seems weird to me, and I think the reason behind it is "deSolve" package we used to solve ODE. In order to get accurate result, the ODE solver just take arbitrarily small step sizes. I will look into this issue later.

rainersachs commented 6 years ago

Oh. I see. Thanks for your prompt reply. That sounds less bad than I thought it was. I agree ode() is one of the problems. It is adaptive, so it is presumably taking small steps already near dose zero. But the slowing down seems more typical of the behavior when some vector has been initialized to a certain length and the program is adding information for indices bigger than than the length.

No hurry!!

Ray

yiminllin commented 6 years ago

Hi guys, I just added a few lines to implement the naive method to calculate CI (consider each parameter separately), and the graph seems Okay to me. I pushed the code to the master branch directly so hopefully I did not mess up the commits...

rainersachs commented 6 years ago

Hi: Minutes for week of Sept. 11. 2017 I met Thursday with Mark at the Strada, Edward over the phone, and Friday with Yimin at my office. It is possible that as a result of the subsequent calculations by Edward and by Yimin we have already got code which addresses all the major topics that might arise during the whole project with the exception of calculating information coefficients (IC) and comparing to earlier models. I am confident that the IC can be calculated so we may be finished as far as possible fatal obstacles. If so we still have a whole lot of work to do: eradicating bugs; cleaning up the code in many ways; cleaning up the commenting and variable names in many ways; adding models that consider only targeted effects (TE), not both NTE and TE (TE models are simpler than the TE+NTE model we are working with now (and simpler than the NTE models in 17Cuc which are actually TE+NTE); writing a report; cleaning up GitHub; understanding the math and motivations behind the R programs; etc. Just that stuff might take all semester but all except perhaps eradicating bugs can clearly be done; none except the bugs is likely to be fatal. However the confidence interval (CI) part runs so slowly on my computer that I have not had a chance to judge if there are mistakes in the code that allow it to run but give the wrong answers. I will try to run the code overnight, tomorrow night if I have time, and then see if I can do some checks using just the environment without re-running the CI parts.

Agenda for week of September 18. Programming and Github: Mark is working on ICs, Yimin if he has still more time than he has already spent is working on on improving the CI part of our program; Edward, if he has still more time than he has already spent should work on checking his MIXDER results and/or on cleaning up GitHub and giving a protocol for its use. I don't use GiyHub correctly as regards folders and pulls and pushes and have already added superfluous stuff that I don't know how to get rid of. All of us should continue to study the relevant literature as time permits.

Outlook: Quite possibly all the rest will be plain sailing, tedious at times, but all amenable to systematic improvements. But we cannot be sure of that until the code runs faster than it does now; so some chance of running into an obstacle that requires drastic major changes in our approach remains. On balance we are ahead of where I thought we would be at this point in the semester.

rainersachs commented 6 years ago

Meeting minutes:

Edward and I met at the Strada today for about 90 minutes, discussed a lot of details, discussed over-all plans for the next 9 months, and got a whole lot done.

At Edward's request I put a .pdf copy of the paper submitted today, SynergyRR, in Edward's repository.

Edward's MIXDER program seems to run well. If Yimin's program, to be discussed tomorrow, works as well, all the rest of the project will almost certainly be feasible and we can start to plan our paper.

Assignment for next week: Edward will see how Yimin's code runs on his computer. He will produce a temporary, truncated version of Merge2:R which omits Yimins code so I can run it on my computer quickly and start to implement my quality control assignment. He will do a lot of work on GitHub, e.g. teaching us how to run .Rmd programs on GitHUb if they won't run on our own computers. Mark will continue to program IC calculations. Yimin's assignments will be posted after he and I meet tomorrow.

eghuang commented 6 years ago

@rainersachs I have created a new file doseExploration.R for your quality control tests which omits Yimin's CI calculations. I also ran all of merge2 and reproduced my plots without issue so I'm not sure where you're getting an error. Perhaps try reverting your merge2.R to the current version?

Also, a quick note on Rmarkdown - .Rmd files are by default rendered by Github so we don't need to actually run anything. For example, this is a Rmd vignette written by a colleague:

https://github.com/cmerow/meteR/blob/master/vignettes/meteR_vignette.Rmd

Github shows the output of the file by default, and you can also view the code itself by clicking "raw". I will update this post or make new posts as I make changes to the repository.

UPDATES:

  1. Yimin's code also seems to run very quickly for me, < 10s so I'm not sure what's causing the long runtimes on your machine.

  2. I cleaned up the repository a bit. There are a few files that I didn't touch (Mark's files).

rainersachs commented 6 years ago

I think for the time being we may need to keep Yimin's HZE CI script separate from HZEsynergyMain_merge2.R and I just downloaded a file which on my computer implements that separation. More generally we need a protocol to avoid stepping on each other's feet. I suggest we aim in this repository for a protocol which allows only Edward to commit files to the main branch. Yimin and I should have to ask his permission (e.g. as the reviewer) via GitHub mechanisms like making a branch and a pull request which I am in the process of trying to learn to use efficiently.

rainersachs commented 6 years ago

At our meeting Yimin pointed out that his CI calculation can probably be speeded up a lot by using a single set of 4 parameters for each of 500 MIXDERs instead of generating new parameters at each dose point of 1 mixder. That is anyway the correct approach in principle: If we have misestimated the parameters then we need a single better set and that will apply to each dose point. Speeding up would be a big plus.

Both Yimin and Edward have emphasized that we might as well do version control manually instead of insisting on the use of GitHub machinery designed for much bigger programs, with many more collaborators, and much more stringent deadlines. So I withdraw my previous comments on needing to use branches and viewers; everybody can commit directly (but please not indiscriminately -- in case of doubt ask Edward) to Edward's main branch and we will be able to reconcile discrepancies by hand.

eghuang commented 6 years ago

Based on the methods sections described in Ray's recent synergy theory paper (a draft is located in the folder misc_materials), it appears that the script is approaching its final edits. I will begin cleaning up and organizing merge2.R with respect to several objectives:

  1. The raw script and its calculations should be very readable to researchers in this field who are at least superficially acquainted with R.
  2. The script should reflect good coding practice and style.
  3. The script should clearly reflect principles of reproducible science and chronologically follow our own methodology.
  4. The script should be easily grafted to an Rmarkdown file if we choose to do so.
rainersachs commented 6 years ago

I did not mean to come across as quite so optimistic. Even assuming Yimin's plan to speed up CI calculations works as we think it will, there are other things that could still delay finalizing the scripts. For example, our models to date all assume that both TE and NTE are important. We still need to devise and analyze the TE-only models that assume NTE are negligible. These will be simpler than the models we already have, so when we have programmed the models we are working on we will be able to program the TE-only models. But eventually we must try to guess which type of model, NTE+TE or TE-only will be more appropriate as regards predicting astronaut cancers. That will involve using the scripts, including information criteria, but also experimental considerations. We do not really know the functionality we need in our scripts until we have written and used them, so we have to prepare for surprises.

eghuang commented 6 years ago

In that case, unless you think it would be better to not make any edits now, I will try to keep my changes from affecting the functionality of the script.

rainersachs commented 6 years ago

No, its fine to edit now and your criteria are excellent. But I think the edits will give in an improved working version to build on. which we certainly could use; I doubt if we can yet think of a near-final version.

yiminllin commented 6 years ago

Hi guys, I just pushed the improved version of calculating CI to ExploreHGsynergyHZEplusYimin9.22.17.R, without changing the main file. It should be pretty fast (~3 minutes on Mac), and the result looks plausible. Running the CI part will generate some error messages about step sizes, which I will try to look deeper into it in the future.

rainersachs commented 6 years ago

Super! Thanks a lot. That sounds like it is the last major worry taken care of.

Edward made HGsynergyMain_merge2.R a lot easier to work with by taking out 100 redundant lines. So this weekend I can work on devising an HZE model which neglects NTE. It will be a simpler alternative, CalculateHZE.TEonlyC, to CalculateHZEC and use the same data. We will need all the methods (calibration, calculating I(d), CI) to apply to this alternative as well. We will need a few minor functions to compare the two models. Then, surprisingly, we will probably be finished with the programming apart from cleanup and testing.

The major bottleneck as regards our paper will be writing the paper. Probably the best plan is to aim for a radiobiology or biology journal and in that case for me to write the first draft, with you guys then improving it. Let me know if you want to suggest an alternative plan (how fast are CS or stat papers written/generated these days?). If the paper last years' CA pod just submitted for publication gets accepted that will speed up our paper somewhat, but I expect a long delay in any case.

See you tomorrow, Edward and Mark, Friday Yimin. Great progress.

eghuang commented 6 years ago

@rainersachs I've ran Yimin's old code on merge2.R and your merge2RKSB.R and the resulting plots look rather identical to me. Here are the plots:

merge2.R merge2 merge2RKSB.R merge2rksb

I've also moved the .csv files to a folder called data and created a new LICENSE file with specifics for our GNU v3 license. This is standard practice for repositories with code intended for public use.

I've also decided not to delete .gitignore for now because it tells Github not to track certain files in our local RStudio project directories that rightfully should not be tracked. I think it's safer to keep the files here until we have more information.

Lastly, I've added two files and a folder to misc_materials. plot_example.Rmd describes how to create simple plots in Rmarkdown and render them in GitHub. plot_example.md is the output of the Rmarkdown file with the actual plots shown. The .md file sources its plot images from the plot_example folder, which is automatically created when the Rmd file is rendered.

rainersachs commented 6 years ago

Hi Edward.

Yes! Those 2 figs. are just what I needed to see. They duly show that my corrections did not (apparently) impact Yimin's results except for very small, visually imperceptible, and thus negligible amounts (less than about 1 part in10^4 as far as IDERs and MIXDERs are concerned except at very low doses and apparently not larger in the CI calculations). Thanks.

Tomorrow, after I talk to Yimin, I'll probably ask you to make a file incorporating all the improvements the 3 of us have made this week. I'll study your other changes and try out the way the .csv file works as soon as I can. But my main priority will be devising TE models now that everything is working more smoothly.

GitHub itself has a mechanism for adding a license to a repository and I'm glad you chose GNU GPL v3.

rainersachs commented 6 years ago

Hi! I just copied HGsynergyMain_merge2.R. I will work on it this weekend on my computer and push it to the new master on GitHub Sunday nite. If you want to change it in the meantime, before Sunday nite, go ahead but please let me know, and I will put your changes into my new version by hand before pushing.

rainersachs commented 6 years ago

We are still moving much faster than I anticipated.

I am done with HGsynergyMain_merge2.R for this week. Result is temporarily called HGsynergyMain.R.

Suggested Assignments for this coming week include:

Edward: Try to deal with HGsynergyMain.R to the extent of preventing any additional complications this week. If time, complete the computational implementation subsection of our new paper because I thought of a (very ambitious and time consuming) writing project you could consider after that is done. Will describe Thursday.

Yimin: In the long run we will need the CI part too include two types of calculations: Several HZE NTE ions and optionally 1 low LET ion. Several HZE TE ions and optionally 1 low LET ion. Then when I start writing the paper I will then be able to ask for lots of figures. Don't have any idea what figures until I have made an outline. Also as much as you can conveniently do to make it as easy as possible for me to play with CI, e.g. fewer dose points. Do as much of the above as you have time for this week. No deadline tho.

Mark: continue to work on understanding IC as time permits. I have just found that, as I expected and hoped, IC for our new HZE TE models are inferior to IC for our HZE NTE models. This will be an important point in our paper so the more we understand about IC the better. No hurry and no deadline.

rainersachs commented 6 years ago

Writing papers for Radiobiologists. A sub-project that is strictly optional, potentially very time consuming, very instructive both per se and for understanding our eventual paper in detail, very useful to me, and subject to repeated revisions as the paper evolves.

Just uploaded a .pdf useful in writing a first draft of the Methods section for Edward (and if Yimin wants I will adapt the .pdf to include instructions for him to work on writing a first draft of the start of the Results section)

rainersachs commented 6 years ago

Edward & I met at the strada and discussed cleaning up the main script and also writing our paper. He does not get the error messages I get when running the script so maybe they are just because my computer is too slow. As time permits he will try to write a first draft of the mathematical methods section of the paper.

eghuang commented 6 years ago

To add, I have pushed HGsynergyMain.R as an update to HGsynergyMain_merge2.R and will begin making style/clarity edits. Variable and function names will be changed to snake_case for readability and handling of abbreviations.

yiminllin commented 6 years ago

Hi guys, I just update the file ExploreHGsynergyHZEplusYiminOctober.R so that CI part could work on HZE TE&NTE w/ Low. No change made to main file.

rainersachs commented 6 years ago

Edward: Yimin and I decided to not merge his new file yet; when we feel more certain that no further big changes are coming then will be the time for a semi-final merge.

rainersachs commented 6 years ago

Hi Yimin: my agenda for tomorrow is the following.

  1. If you have had time to make progress on coding or figures outline it for me.

  2. I will outline what was discussed when Edward and I met.

  3. How about checking your 95% confidence interval code in two ways. The first is to pick a 2-parameter IDER, e.g. E= A(1-exp(-Bdose)) for some specific real positive A and B. Then construct a 2x2 variance covariance matrix which corresponds to a correlation about halfway between 0 and 1. Then generate some virtual data. Now calibrate A and B using your virtual data and nls(). Use the resulting variance-covariance matrix and your Monte Carlo Method to generate the CI.

  4. To check in addition that your methods work for mixtures, chose one IDER so simple that you can calculate the inverse function, e.g. E_1= A(1-exp(-B_1dose_1)). Pick a second IDER that is "similar", i.e. has "constant relative potency" as defined in the supplementary information for the Dae Woong Ham submitted paper, i.e there is a real positive constant K such that E_2(d_2)=E_1(Kd_2). Then, as proved in that supplement, one can integrate the equation of incremental effect additivity explicitly and the result is I(d)=E_1(d_1+Kd_2)=E_2*(d2+d_1/K). So now one should be able to use virtual data similar to the method in 3 to check the part of your program that calculates CI for I(d). Unless K=1, E_1 and E_2 refer to 2 different agents, whose effects are measured in 2 different kinds of experiments (except for dose =0: a mouse doesn't know whether agent 1 or agent 2 is being used for sham irradiation at zero dose). So you can assume (A_1, B_1) is statistically independent of (A_2, B_2).

See you tomorrow unless Evans is closed down or the Bay Area has been wiped out by a fire storm. In the latter case I fear you will not find me in heaven or purgatory; look for me in the alternative location at its best caffe.

In haste: Ray

rainersachs commented 6 years ago

Hi Edward: Your notation changes in the main file seem excellent. Thanks! However bayesian_ic should be changed to information_coefficients (we also use AIC, not just BIC). And I think calculate_id should be replaced by calculate_I(d). The capital I and the parentheses contradict your guidelines for names, but they are so much closer to the paper, which will use the notation I(d) about 100 times, that the inconsistency is worth it. When I later clean up the commenting to match the paper more closely I will use I(d) a lot. The deletion seems OK. Please don't change anything that might impact Yimin's calculations. He is now preparing a number of figures for me and says there are lots of issues. I'll write again after I've seen some figures and heard today what the issues are.

eghuang commented 6 years ago

Ray: I will change bayesian_ic_table to info_coef_table. I would be happy to change all instances of id in our function and object names to I(d) but R would interpret it as a function call and get confused. I don't think there is a solution for this, since parentheses are a special character in R. We could perhaps use calculate_I as an alternative, but that may be confusing.

Yimin's code runs fine in HGsynergyMain.R - I have not altered any of his code outside of name changes to objects created in the lines preceding his code.

rainersachs commented 6 years ago

ThanksI

Before making the I(d) suggestion I worried about the parentheses so I tried

calculate_I(d) <- function(d) d^2

It ran just fine. I'm guessing that by starting with calculate_ you are telling R that everything before the next white space is a character string? In our case the function is inside a loop. Also it doesn't fit on one line so its format is x <-function(d) { d^2 } But I assumed that does not matter.

On Fri, Oct 13, 2017 at 1:44 PM, Edward G. Huang notifications@github.com wrote:

Ray: I will change bayesian_ic_table to info_coef_table. I would be happy to change all instances of id in our function and object names to I(d) but R would interpret it as a function call and get confused. I don't think there is a solution for this, since parentheses are a special character in R. We could perhaps use calculate_I as an alternative, but that may be confusing.

Yimin's code runs fine in HGsynergyMain.R - I have not altered any of his code outside of name changes to objects created in the lines preceding his code.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/eghuang/NASAmouseHG/issues/3#issuecomment-336562584, or mute the thread https://github.com/notifications/unsubscribe-auth/AQ446_8SB_PrcJyam2r6mlzs0cLgGDxdks5sr8ufgaJpZM4O3Fss .

eghuang commented 6 years ago

I ran the same line, and got:

> calculate_I(d) <- function(d) d^2

Error in calculate_I(d) <- function(d) d^2 : could not find function "calculate_I<-"

After trying a few other examples to no success, I poked around for more information on valid naming. The R FAQ explains that only "syntactically valid" names can be used for assignment. A syntactically valid names is:

a string the parser interprets as this type of expression. It consists of letters, numbers, and the dot and (for versions of R at least 1.9.0) underscore characters, and starts with either a letter or a dot not followed by a number. Reserved words are not syntactic names.

There is an exception where I can force calculate_I(d) to be a function name by using assign, but calls to the function will error.

R has a base function called make.names that takes in a string and outputs a valid "syntactic" name for that string, with modifications if necessary. Unfortunately, it does not recognize "calculate_I(d)" as syntactically valid.

> make.names("calculate_I(d)") [1] "calculate_I.d."

After learning these things, I still believe that using I(d) in our function names will adversely affect the functionality of our script.

rainersachs commented 6 years ago

strange. I give up. We'll keep repeating what "_id" means in the comments. A good meeting with Yimin. I'll upload summary by Sunday morning. Have a good weekend!

On Fri, Oct 13, 2017 at 3:19 PM, Edward G. Huang notifications@github.com wrote:

I ran the same line, and got:

calculate_I(d) <- function(d) d^2

Error in calculate_I(d) <- function(d) d^2 : could not find function "calculate_I<-"

After trying a few other examples to no success, I poked around for more information on valid naming. The R FAQ https://cran.r-project.org/doc/FAQ/R-FAQ.html#What-are-valid-names_003f explains that only "syntactically valid" names can be used for assignment. A syntactically valid names is:

a string the parser interprets as this type of expression. It consists of letters, numbers, and the dot and (for versions of R at least 1.9.0) underscore characters, and starts with either a letter or a dot not followed by a number. Reserved words are not syntactic names.

There is an exception where I can force calculate_I(d) to be a function name by using assign, but calls to the function will error.

R has a base function called make.names http://stat.ethz.ch/R-manual/R-patched/library/base/html/make.names.html that takes in a string and outputs a valid "syntactic" name for that string, with modifications if necessary. Unfortunately, it does not recognize "calculate_I(d)" as syntactically valid.

make.names("calculate_I(d)") [1] "calculate_I.d."

After learning these things, I still believe that using I(d) in our function names will adversely affect the functionality of our script.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/eghuang/NASAmouseHG/issues/3#issuecomment-336580829, or mute the thread https://github.com/notifications/unsubscribe-auth/AQ4464vXIRxZxkt3a0ku_qbV69kT0U4hks5sr-HUgaJpZM4O3Fss .

rainersachs commented 6 years ago

Hi! I found a bug. In data frame newly named "hg_data" and formerly named "dfr", in the row still named "Nweight" all entries need to be multiplied by 0.01. Thus the row should now start: Nweight = 0.01*c(520,2048,1145,584,313,232,293,221,
and so on (with length(Nweight)=53).

I think this makes no difference in any calculation, only in what we say in writing the paper. The only change will be that when summarizing the models obtained by using nls() the program gives corrected values ~ 0.1 for "Residual standard error" instead of values ~0.95 that appeared previously.

Yimin, please make the correction in your version. Edward, please correct your version and hg_data itself, as well as any other places needed, in your files and in GitHub.

If either of you has time to check that indeed nothing is affected in the subsequent calculations so much the better.

Last week Edward and I met and got a lot done. Yimin and I met, got a lot done on the figures, decided that the figures have top priority for the time being, decided that we should not merge HGsynergyMain and Yimin's version just yet, talked a bit about testing the program, and talked about some theoretical/mathematical issues.

This week I think we should meet at least to touch base assuming you guys have time. I will have a bit to say, though not much. Since you both have lots to do even without any new assignments, no harm if you cancel, but please let me know at least 18 hours in advance, if you can.

Mark: I hope your classes and applications are going well.

Good progress continues. Thanks!

yiminllin commented 6 years ago

Hi guys, Sorry for the delay. I am a little busy this week (midterm, projects, homework etc.). I just updated my script for plotting: plottingYimin.R. but the code is disorganized and non informative. Fortunately the plot generated seems good. I uploaded all the plots to the plots/ folder (.eps files are compact and the folder is under 1M, so I uploaded them directly. Hopefully that will not be a problem while pulling), the name of .eps files should be informative. Cheers.

eghuang commented 6 years ago

Hello, I've worked on tidying up HGsynergyMain.R over the last two weeks. I deleted and combined a lot of redundant code and made many small stylistic changes to the script. Most importantly, I've overhauled calculate_complex_id to handle MIXDERs for combinations of any of our IDERs for both the NTE and the NTE + TE models. The function should be hopefully more elegant now, and I've abstracted away a lot of the body so that it can handle other IDERs if we ever decide to construct new ones. I've also changed Yimin's CI interval part slightly so that it tests calculate_complex_id and the new version seems to behave identically to the old version. The old calculate_complex_id is left commented out above the new one in case any of you would like to critically examine the two, or if Ray decides that we should keep the old one instead.

I'm fairly pleased with how the script is starting to look and believe that it now is very close to being presentable. Some significant tasks ahead include adding new iron data, adding Yimin's other CI calculations and plots, and deciding how we want to order our code.