sachsURAP / LSSR-2019

Data and code for the Life Sciences in Space Research 2019 paper. Concerns modeling murine Harderian gland tumorigenesis induced by mixed radiation fields.
GNU General Public License v3.0
1 stars 1 forks source link

Meetings, assignments, verbal discusions and questions, including those about math and programming #3

Closed rainersachs closed 6 years ago

rainersachs commented 7 years ago

I suggest we consider every file except today's upload by me and the Rmd file as obsolescent and try to transfer their information into the file I just uploaded. I forgot to ask Edward how to merge files within GitHub. I have done some of that (painfully) with R-Studio. Once we are sure a particular other file is obsolescent, let's rename it to contain the word OBSOLETE. I think maybe we can already do it to every file except 2: the one I just committed that ends in GH.R (for GitHub) and Mark's file. Mark: please change your filename to something more informative ending in GH.R, e.g. OurIDERs_vs.2017ccHazardGH.R or something.

rainersachs commented 6 years ago

Yimin: here is a summary of our meeting. Please read it, correct any errors, or let us know you have read it and have no errors to correct. Thanks!

  1. Eventually, but not right away, we will need to use a corrected data frame (with some Fe LETs changed to 185, and with 52 rows rather than just 47 because the “no isograft” data is included) for all calculations. In order to facilitate eventual additions of new data (and perhaps additional corrections due to new information) the code will always be written in such a way that changing a single data-frame entry in one spot automatically transfers that change to all parts of the code that use the spot.
  2. Eventually, but not right away, Edward will implement item 1 above.
  3. For the time being Yimin’s most important calculations, because they cover the one item where we are not yet reasonably sure of the how the outcome will affect our overall ideas about how dangerous HZE tumorigeneses is, is to compare TE only HZE models with NTE HZE models (which by definition incorporate both TE and NTE) using AIC (and BIC) information coefficients. We must always compare IDERs, which by definition are zero at zero dose, not models having the form Y_0+IDER where Y_0 is background (zero applied dose) prevalence. When calibrating adjustable constants from non-zero-applied dose data one has to take data~ background +IDER(dose). 3.1. Calculate AIC assuming Y_0 = 0.0275 % as in the present program. 3.2. Calculate AIC assuming Y_0 = 0.0500, almost twice as much, which, it has been argued, is a better approximation, based on more recent data rather than a convoluted combination of recent and 1980’s data. 3.3. Calculate AIC assuming Y_0= 0.0100, just to satisfy Sachs’ idle curiosity.
  4. Yimin’s present graphs are excellent but insufficiently labeled as regards the maximum doses, maximum effects, ion(s) used, type of calculation, which curve is which, and in some cases numerical values (such as ribbon widths). He will add this information, automated so that when we correct the data frame the graphs automatically will be tweaked accordingly. Later a few more graphs will probably be added as we write the LSSR paper.
  5. Yimin will make graphs of 95% CI (ribbon height) vs dose for NTE models of various mixtures, with CI calculated by varcov and by naïve error analysis (always 2 curves on one graph).
  6. Yimin will clarify his code and then release it to Ray (who will play with it and do some quality control) and Edward (who will rewrite it in a form suitable in a CRAN repository and do some quality control).
  7. Assuming I(d) [which should always be used for Monte-Carlo 95% CI calculations because we don’t like S(d) anyway] Yimin calculated how many Monte-Carlo sample paths are needed before 95% CI become stable in one example. For four HZE with NTE IDER + 1 low LET ion which carries 80% of the dose, and for dose range [0, 1 Gy), 200 sample paths were not quite enough, but 1000 were. We need a few more examples of this kind. The examples cannot be chosen systematically because there are too many possible mixtures (the high complexity problem we have faced all along). Yimin will try a few, starting with the same five ions but only 20% of the total dose in the low LET ion.
yiminllin commented 6 years ago

Hi guys, I finished implementing IC part. I implemented it based on Dae's code. Honestly I totally don't get the theory behind the information criterion stuff, so I just use the formula of AIC and BIC on Nonlinear Least Square for granted. The code works, but maybe there are some bugs I didn't notice. Also I experiment with different background (y0) as Rainer suggested. I didn't update my plotting script (time is limited), and I planned to organize them over this winter. I believe I will not have time over RRR week to do extra work for this project (exams to review and a presentation to prepare). Anyway, it is a great semester for me to work on this project! Good luck on the final~ Happy studying.

rainersachs commented 6 years ago

# Over All Plan for Spring Semester Starting Jan 1 we should carry out the following steps.

  1. Make a master file for the main calculations. Thereafter: this will be the only master version; it will live on GitHub; all changes to it should go through me; all other versions should only be in sandboxes in our own computers or, if any other versions are in GitHub, they must be clearly labelled in their filename as subordinate, obsolete, temporary, or some similar word.

    Yimin: in our latest meetings you described a whole lot of very useful and interesting results. It is time to make those available to the whole pod. I tried my best to deal with not using your script until you are satisfied with it, but that really didn't work during vacation. It cost me time and confusion. I keep getting new information from NASA colleagues or new papers; I keep getting new ideas to try out; I have to put the corresponding changes somewhere; I ended up often making the same change or tries in various different scripts, and introducing inadvertent inconsistencies; also I was sometimes afraid to delete stuff because I was not sure that I might inadvertently delete information found nowhere else. We will be making a big push at the start of the semester and it is time to be more systematic about the results we already have and all the comparatively minor changes we will still in all likelihood need.

    As soon as we have a master, even though it will still be quite preliminary, we can start on other jobs, and eventually get to some jobs that are actual new research, not just routine.

  2. Clean the master enough to make correcting and extending it a bit more convenient. Cleaning it entirely can wait till much later.

  3. Testing the master. All members of the mouse pod should try to find bugs and misunderstandings. I will leave line-by-line checks to you guys. Instead I will pick lots of cases where I can independently calculate or think I can guess results or trends, and see if the script gives figures which show the expected features.

  4. Clean up and add enough figures for drafts of the LSSR paper. This will be stepwise: get what seem to be enough figures; write the results section of the paper; change some figures, delete some, add some, write a second draft of Results; etc.

  5. Do actual research and write new chunks of the master. We still need code for the TE models. Yimin already has some important robustness (actually, it turned out, lack of robustness) results but we need more sensitivity results. Other forays into unknown territory will emerge as we write the paper. All we know at present is that we have enough for a paper; optimizing the paper remains to be done and will undoubtedly involve some additional calculating and coding.

  6. Have some meetings with some of the chromosome aberration pod to discuss issues common to both pods.

  7. Start to think about another paper. That will involve brand new calculations. Mark may be starting on some possible ones.

yiminllin commented 6 years ago

Hi guys, Hope you are enjoying your vacation. I just finished organizing plottingYimin.R, and I believe the code is cleaner and more readable. I encapsulate the important calculations into some main functions, namely: plottingHelper, multiplotHelper and CIHelper. Your could treat them as blackbox and these three functions simply do all the calculation and plotting required. The code is absolutely clean to work with now, and you could check and modify them easily. I did not add detailed comments, but only brief ones, because I believe the code itself is self explanatory. Some important notes here: I output the plots into "~/Desktop/plots/", and you could change it by changing the parameter of function: plottingHelper, or simply do a substitution in .R file. Also, I didn't modify the dataframe (195, 197 to 180 stuff), but the code could be easily adapted to this change when needed. Generated eps files are in this github folder : plots/updatedVersion/. There will be some minor issues (e.g. tick marks, size of gaps, text on axis), and I will fix them in the future.

rainersachs commented 6 years ago

Happy new year!

Our main HG pod job spring semester 2018, which I outlined 7 days ago above, will be getting figures for, and then writing, a LSSR paper. The elegant code Yimin posted 5 days ago probably shows that no roadblocks remain in principle, so that we may be able to finish this job this semester by a lot of work on the details. I say "probably" because I cannot yet run the code on my computer to check, for a variety of reasons. But I think when Edward gets back we should be able to implement the key first step of the main HG pod job, making YinMin's code on GitHub the dominant master file and subordinating all other files to it. I wrote Edward a detailed plan for this.

Meanwhile I cooked up a more mathematical (and speculative) project that is completely independent of all other URAP projects to date and uses the theory of functions of a complex variable. Yimin said that he might work on the project sometimes as a break from programming the main HG project. I have just posted a .pdf file describing the first few steps. I am sending Yimin two auxiliary .docx files by email because I suspect you guys (like me at the moment) may not find .docx files on GitHub useful (let me know if you can do something with them and help me figure out how to do it).

Looking foeward to seeing you soon, Ray

rainersachs commented 6 years ago

Regarding the functions of a complex variable optional project here are a few further comments. Pretty much all I know about the project is in Section A5 of IEA_optional_complex variable_assignment.pdf that I put in the misc_materials GitHub folder yesterday. That section A5 contains the following.

  1. Some scattered comments and results on incremental effect additivity (IEA) behavior for mixtures. These comments and results are relevant to this optional project only insofar as they motivate the choice and interpretation of IDERs and slope generators, as illustrated in item 2 below. The project concerns IDERs and slope generators. If we can clarify those, using the IDERs in IEA synergy theory will be a follow-up project.

  2. A5 gives some key examples of slope generators which are polynomials all of whose zeros in the complex plane are simple (the Taylor series a_1z +a_2z^2+... in the neighborhood of the zero has a_1 non-zero. For this case a full classification of the following form can and should be carried out for IDER types characterized via locations of the zeros of the slope generator in the complex plane (a) The IDER for non-negative (real) doses is not defined for large doses because it approaches +- infinity at a finite dose. This is the case we have to avoid. It occurs, for example, if all the zeros are located on the imaginary axis but not at the origin. (b) The IDER is defined for all non-negative doses; it approaches infinity for large doses. This is fine. It occurs for example if the slope generator is 1+z, positive real at the origin and having one zero on the negative real axis. (b') The IDER is defined for all non-negative doses; it approaches -infinity for large doses. This is also OK, not as an IDER by itself but as one component of a mixture, provided there are other strongly positive components in the mixture. The -infinity case occurs for example if the slope generator is z-1, negative real at the origin and having one zero on the positive real axis. (c) and (c') The IDER is defined for all non-negative doses; it approaches a finite (real) constant C for large doses. This is also OK. For example if the only slope generator zero is located on the positive real axis and the slope generator is real and positive at the origin, then C is positive. (d) The IDER is zero for all non-negative doses. This occurs iff the slope generator is zero at the origin. Surprisingly, it is a very interesting case, not, of course, as an IDER by itself, but as a component of a mixture some of whose components have positive IDERs.

  3. A5 also contains some examples where the slope generator is a polynomial some of whose zeros have a_1=0. It also contains a few examples where the slope generator is not a polynomial. Basically we want to find enough slope generators to cover all cases of interest in practice (which will most probably require some non-polynomial slope generators). But we need some restrictions which insure that non of the resulting IDERs become infinite at a finite dose. Perhaps focusing attention on the poles and zeros of the slope generators will give us an elegant collection of slope generators. I have started to think about slope generators such as A*cosine(z)+B, where A and B are real. These can have an infinite number of zeros in the complex plane. Also, slope generators that have poles are candidates, as are multi-valued slope generators that need Riemann sheets for their detailed analysis.

  4. A5 also contains some comments on and applications of the qualitative theory of differential equations. Since the optional project is concerned with only one IDER at a time, I expected this to be trivial, but that is actually not the case. The reason is that when IDERs are defined in terms of an AIVP we usually cannot solve the for the IDER explicitly. R ODE integrators should be enough to indicate what we need to know about the qualitative properties of such IDERs but it is better to have proofs using the qualitative theory. Here is an example. Suppose dE/dd= 2+cos(E). Does E reach + infinity at a finite dose? Of course not: 2+cos(E) < 4 and dE/dd = 4 has solution E=4d which is defined for all doses. So the solution to dE/dd=2+cos(E) must be smaller than 4E, thus never reach an infinite value at finite dose. But giving the proof or using numerical integration to show this is a pain in the neck and for less trivial cases the result may not be obvious. We need a book on the qualitative theory of ODE (AKA dynamics) that presents in simple form the relevant results and proofs about attractors etc. even in the 1-ODE case, not just as a special case of many coupled ODE, or an infinte number. There must be such books. Finding one may be tedious.

rainersachs commented 6 years ago

Our plan continues to be making Yimin's code the one and only master script as soon as possible.

But thinking a bit more about this what I then need next is code that I can run in a sandbox in my own RStudio in my home computer. I want to be able to write code that generates almost any kind of plot for IDERs or mixture calculations except for ribbon plots. It should show plots on my sandbox RStudio plot panel and not in any separate files. Then I can quickly reject lots of plots I try; the ones I like best I can turn into .eps files just by export, then I can refine them further in Adobe Illustrator, tweak the plotting code, iterate the procedure, and finally when I am satisfied tell Yimin or Edward to put them into the .ggplot part of the master file. To do that I would need a chunk that is only for me and should be commented out by everyone else. It would have to use plot, not ggplot. It should be as idiot-proof and as cranky computer proof as possible, since it would be run by the one on the other.

I think if Edward or Yimin could add such a chunk as close to the start of Yimin's master as possible, and put in it 3 or 4 representative plots that translate ggplot into plot I could probably write all the rest just by looking at Yimin's other ggplot files and translating them into plot. I would not have to ask either Yimin or Edward to constantly be sending me .eps files for tweaking. Only when we are closer to finished would my chunk be erased and all the information be in ggplots in the master.

Is that feasible?

In haste, Ray

rainersachs commented 6 years ago

Hi Yimin:

Your code through line 276 runs and gives me almost all the functionality I need. Remaining improvements needed in order of decreasing importance:

  1. A few examples of gg2plot translated into plot and put somewhere before line 276
  2. Hardest: make I(d) run up to dose 1 Gy whenever possible. This means avoiding error messages from uniroot sometimes when the dose points are not chosen just right. See if you can find doses that give error messages from uniroot. Then see if you can fix them or there is a maximum dose < 1 Gy above which nothing works.
  3. Give me easier control over the dose ranges of calculations and plots.
  4. Cosmetics but not functionality
  5. Optional: give me the ability to run Monte Carlo for 2 or 10 sample paths rather than 1000
rainersachs commented 6 years ago

Yimin will see if reducing the number of dose points for a ribbon plot down to as little as 10 or 20 speeds up the master script without changing the plots markedly except perhaps at doses so low they will not be noticeable on most of our plots and are lower than almost every data point (i.e. are < 0.01 Gy).

rainersachs commented 6 years ago

Next week Claire will continue to study Yimin's code line by line. Yimin will work on some plots and also see about speeding up the code. See you guys tomorrow! Edward is waiting for when Yimin's code is a bit more settled.

rainersachs commented 6 years ago
  1. The HG pod got a lot done this week, including Yimin's speeding up of the Monte Carlo calculations. Most of you will write minutes of our meetings and I will comment on those when they appear on GitHub.

  2. Sunday night 2/18/2018 Yimin's code (plotting_Yimin.R) on GitHub will become the master file and be turned over to Edward for quality control. Yimin may or may not have time to add a few last-minute changes before then; either way is fine. After Sunday all changes to this master should go through Edward, who will work on debugging it and, with Yimin's help, adding various bells and whistles. Yimin and Edward should cooperate closely on this.

  3. We are postulating 2 types of HZE models, NTE (i.e. TE-NTE since TE are always assumed and always dominate at large doses) and TE (i.e. TE-only, with NTE assumed negligible at all doses). At the moment we only have graphs, variance-covariance matrices, and Monte Carlo CI calculations for our NTE HZE models. Eventually we will need graphs etc. for our TE only models as well. This will be relatively easy and can be postponed because it will be needed only for the "major" report/paper about a year from now, not for the "minor" paper/report that should be ready, with luck, this semester.

  4. The HG pod minor paper is progressing well. I hope to have Title, Abstract, Introduction, and Methods sections ready in a couple of weeks. I will then send it to you guys for information and for your criticisms.

Mebert314 commented 6 years ago

We went over my results on the sham mixture principle and m&m in Lam’s independent action theory in the cases where the IDERs (n=2): a) Started at 0, and b) Started at 0, ended at finite value (in this case 1) With n=2, case (a) is identical to simple effect additivity. I will check if it holds for n>2. For case (b), with IDERs E1=1-exp(-d) and E2=1-(d+1)^-2, we discussed my results showing that: E1 satisfies sham (e.g. 50-50 mixture), E2 doesn’t satisfy sham, and that m&m doesn’t hold. From these examples, it seems that Lam won’t help us prove that sham implies m&m, so we should drop it for now. Moving forward, my assignments are to compare Lam with the other synergy theories, and see if/how/when we can use it. Also, check if Lam simplifies to simple effect in cases where the IDERs: i) Start at 0, go to infinity ii) Start at –infinity, go to 0. with n>2 (I suspect that it doesn’t).

rainersachs commented 6 years ago

As of now, the plotting Yimin script on GitHub is turned over to Edward for quality control. All changes, major or minor, should go through him to avoid confusion. The GitHub script, perhaps under a different name, is now the master file for the project.

3 immediate jobs are the following. 1). Yimin's changes that speeded up the Monte Carlo calculation should be added to the script. 2). the LET values L in the data bases for Fe56 at 600 MeV/u have to be changed from values like 173 or 193 or 196 to all be 185 keV/micron; this change has to be implemented throughout the rest of the script wherever appropriate. 3). All the output of the high LET NTE model, like figures, variance-covariance matrices, p=values from regression summaries, etc. has to be duplicated for the high LET TE model. This third job can actually be postponed for a while because it is not needed for our next (minor) paper, only for the major paper to be started later. But the other 2 are need for the minor paper and I will soon have a version of that with more details on what output is needed.

Edward: I will keep my sandbox synched with the master by hand. When commenting the script you need not worry much about comments that relate the script to the radiobiology literature or our upcoming papers. When you are close to finished, let me know and I will send you a version that includes those kinds of comments. I will have to ask you to add various ggplot2 figures to the master as I gradually write more of the next paper (the minor one). More on that when we next meet.

eghuang commented 6 years ago

I've been looking through Yimin's script this weekend. Several items I've noticed:

  1. The plotting script lacks the updated names and rewritten modeling functions I added to HGsyngergyMain.R last fall. I will add these as I merge our two files.
  2. Hardcoded variables (i.e hardcoded data pipelines). This makes it very difficult for us to make changes to the data or to have other researchers examine/run our code.
  3. Lots of repeated code, which has led to a total script length of ~1200 lines - I'm not yet convinced that this is necessary, but I may be wrong.
  4. Lack of commenting in the plotting functions: this is not of prime importance but it will delay my familiarization with the script as we use it in the future to produce plots for the TE models.

Items 1 and 2 are of much greater importance than 3 and 4. Since the plotting is not very relevant to the "computational implementation" of the script, I am not sure whether refactoring the plotting code should be prioritized. Whatever the case, I will be forced to refactor at least a few parts of Yimin's code as I combine his plotting code with my updated data and model code since variable names and data are inconsistent between the two. This minor refactoring will take care of item 2 on Ray's latest post (i.e. updating the LET data).

ACTION ITEMS:

Yimin - If convenient, please add additional comments to your main three plotting functions to clarify what certain blocks are doing (especially obscure code from other libraries or esoteric code from Stackoverflow threads). This is of lesser importance than pushing your Monte Carlo changes as described by Ray above. Ray and Yimin - Thoughts and comments on the above would be appreciated.

rainersachs commented 6 years ago

Hi: As regards your item 1, I have a minor suggestion which you can ignore or accept as you choose. Instead of using all lower case names allow upper case iff a well established acronym is involved.

  1. As regards 2. I agree that we have to change that. We are hoping other groups will not only use but also modify the code to test their own opinions on modeling and to correct our possible coding mistakes and our inevitable conceptual mistakes. We should make that easy for them.

  2. As regards 3, Yimin left a lot of extra stuff in at my request.

  3. Your 4 is not of high priority. However you and I will have to work out some way I can get figures as we go. I'm thinking the following but you may have better suggestions: Before putting any changes on GitHub you test that basic functionality remains as regards my using the data frame and being able to add (in my local sand box only, not in the master; your master should allow this but not even mention it, let alone implement it) plots of my own choosing (none of those will use the Monte Carlo part of the script). Sporadically I will send you a draft figure, you will add a ggplot2 figure or a placeholder to the master and I will make the figure in my sandbox. Figures which use the MC part of the master I will leave entirely to you and just keep tweaking by email .eps versions you keep sending me by email.

  4. An hour ago I finished the penultimate draft of Title, Abstract, Introduction, and Methods Sections of the minor paper. I will sleep on it and send around before we meet this week. I think you should read the result fairly carefully before you you get too far into revising the master script.

  5. I think we might be able to submit the minor paper this semester and am making that my top priority. Your top URAP priorities should be, and as far as I can see are, different. With a comparatively small increment of effort we should I think be able to reduce this discrepancy to almost nothing.

  6. As I keep saying, I think URAP delays are much preferable than letting URAP interfere with a student's real academic or personal life.

rainersachs commented 6 years ago

While writing the minor paper I ran into a discrepancy. For the minor paper I want to use the LET Values for individual ions as given in papers by Alpen (or by Chang respectively) for 20th century (and recent data respectively) . However the experimental methods were not quite the same: Alpen measured and used the LETs the ion tracks have when entering the accelerator beam; Chang measured and used the somewhat higher LETs the ion tracks have when they enter the mouse after being slowed down by intervening matter. In prospective major papers a year from now or so this discrepancy will have to be dealt with somehow. At the moment I don't know what method will be chosen. Because it concerns data physical parameters rather than parameters obtained by regression during modeling the ultimate authority over what method to use is in principle up to the experimentalists not us. In the meantime we should keep in mind that we may eventually have to change some parameters in the data input for our scripts (in addition to adding new data, and perhaps to correcting recording errors for the old data which are easier to deal with because they do not involve differences in experimental techniques,

rainersachs commented 6 years ago

I figured out the way we should handle the discrepancy I discussed here 5 hours ago. It will probably be acceptable to everyone who studies any of our reports or papers. I will just implement it even for the minor paper; and thereby put the whole issue permanently to rest. It will cost me about 8 hours and it could be up to a month before I find the time. But there are already manipulations in the code, where we use physics equations to fill in a few extra columns of dfr given some of the other columns that have nothing to do with any actual data but merely depend on physics quantities like Z and LET and Mev/u. The method of correcting for the LET discrepancy due to different experimental Alpen vs. Chang protocols is very similar to the method already used to generate extra physics columns. Instead of generating extra physics columns we will correct the LET column to correct for the Alpen vs. Chang discrepancy. Edward will certainly need to incorporate the present physics number manipulations somewhere (in R or .xlsx or .csv manipulations?) so basically he will just need to eventually add one extra calculation of the same type. So nothing relevant will have to wait for my detailed implementation.

I hope ;-)>>

rainersachs commented 6 years ago

Hi Yimin: do you want to meet this Thursday or skip this Thursday? In any case Edward may want to ask you for some details at some point and you, Mark, and Claire have the separate complex variable project to work on.

yiminllin commented 6 years ago

How about I arrange a meeting with Edward this week (or by email)? And we could meet next Thursday after I talk with Edward.

rainersachs commented 6 years ago

Sounds good to me! Edward may not yet have a lot to ask but I think it will be useful if you and he can discuss plans in person or by email. See you Thursday 8 march around 2 or 2:15. -ray

rainersachs commented 6 years ago

Hi Yimin:

Unfortunately I have an urgent telecon just scheduled on another matter tomorrow and have to cancel our 2:15 meeting tomorrow. See you Thursday the 16th!

rainersachs commented 6 years ago

Hi Peter:

Sorry that I only realized while talking to Hada how severe the confounding by complex CA in the human lymphocyte data set is. I think we just have to write off the human lymphocytes as a sunken cost at least for the time being. On the bright side: (a) We can focus on the fibroblast data set, where such confounding is almost absent. Since Hada now tentatively plans to add more fibroblast experiments, the fibroblast data set will probably become substantially more informative. (b) The developments inadvertently illustrated a typical feature of research. There are always lots of mistakes, including mistakes in judgement like the one I made, from which one can learn. Cranks also make lots of mistakes, but won't admit, even to themselves, that they are wrong. So they can't learn from their mistakes and thus never get anywhere.

eghuang commented 6 years ago

This thread is quite long, as noted by Ray. I am archiving the thread. New updates and discussion should be directed to: https://github.com/eghuang/NASAmouseHG/issues/7