rainersachs / URAP.CA.obsolescent

Analyzes data on Chromosome Aberrations (CA) induced by accelerator-simulated Galactic Cosmic Rays (GCR) in ongoing NASA sponsored experiments. The emphasis is on synergy theory. URAP project fall 2017
GNU General Public License v3.0
2 stars 1 forks source link

fibroblast_assignments & results_2 #3

Open rainersachs opened 6 years ago

rainersachs commented 6 years ago

Starting a new issues thread for spring semester 2018 and beyond because the old one is too long.

We need consistency with the recent paper by Ham et al. on the fibroblast data, so I am putting it here. 18sp_final_RR_Ham.pdf

Andy and Peter. In Ham et al. please study in some detail Fig. 7, Fig. 8, their captions, and the sub-section "Synergy analyses for two-ion and six-ion mixtures: 95% CI for I(d)" that contains them. I got a much broader ribbon, with much higher upper limits than in Fig. 7 from the v2 chunk of Andy's script he posted Sunday. Are you getting much broader ribbons also? If so that is a major discrepancy that we need to resolve ASAP. Ham's .Rmd is in this repository as URAP.CA/Obsolescent/CAfibroGH.Rmd and probably is in Andy's sandbox. Do you get the same vcov Ham does? Are you sampling from that the same way? If you are getting the discrepancy, Let's try to resolve it soon so we can get Ham's help if we are stuck.

There are 2 related problems.

1). I don't see in Andy's v2 chunk any comparison between the 2 different ways to compute 95% CI for I(d) (the two panels of Fig. 8 in Ham et al.). Is that comparison in chunk v1 or v3; or is it missing?

2). While running Andy's chunks I once got a figure that contains the same error Andy recently corrected, where a ribbon fell entirely below the I(d) curve instead of enclosing the I(d) curve as it must. Somewhere in Andy's script that mistake is still lurking (though perhaps commented out). The mistake should be cleaned out before it causes more confusion. More generally there needs to be a lot of quality control for the script.

Thanks! See you guys Friday I hope

rainersachs commented 6 years ago

Peter: you have two files "MIXTE". If these should now be deleted please delete them or ask me to. If they should be retained please give them more informative titles and move them to the Lymphocyte subfolders if you can, or perhaps delete them and then add them in the subfolders.

rainersachs commented 6 years ago

Here are some issues for summer 2018 and perhaps beyond.

  1. Some material from before October 2018 has been permanently deleted to avoid clutter. Much of what is left is now obsolescent. It is scattered in our scripts, in our sandboxes, and in the repository DavidHam97/GCRFibroCA.
  2. We will probably never model Hada lymphocyte data. This holds especially for human lymphocyte data due to the confounding factor of many complexes in that data which, in my opinion, precludes sound extrapolations from 2, 3, or 4 color FISH to whole genome equivalents (i.e. essentially to 24 color FISH). See recent papers by Cornforth and coworkers for some of the problems.
  3. Our short range goal is a UCB group paper on the published 82-6 Fibroblast data in Hada's 2014 and 2016 papers with Cucinotta and others. There are a few corrections to those papers that are already incorporated into our scripts. The paper will be written for the journal LSSR.
  4. In the long run, we should aim for a major paper, with Hada as first or last author, on all Hada 82-6 fibroblast data, including some very interesting data on mixtures. This is in my opinion a very important data set. I offered that we would write a first draft and Hada agreed in principle. I think we should postpone the paper until: (a) Hada and her modeling team including Plante, Slaba, and Wu have a chance to flesh out their take on the newer data in a paper; and (b) we can find to do the writing and the specific calculations needed. So the delay will probably be long.
  5. In the short run I need to use Peter's Monte-Carlo scripts to generate sample figures for the minor paper.
  6. I or a student volunteer should double check our and others' calculations of the Z^2/beta^2 term in the (probably obsolescent) amorphous track structure Katz type models used for this 82-6 data set in some published papers. Basically we have to use Barkas' formula. The sooner the better so we don't have to do too many corrections. Ham is eager to get the results also for his SURF project. I hope to have time early in July to do that barring setbacks in other projects.
rainersachs commented 6 years ago

There are problems with the files in the GitHub repository rainersachs/URAP.CA. I outline some of them below. Peter please decide, with input from Andy and perhaps Dae as needed, whether to shut down all work of the URAP CA pod till the fall semester starts, or proceed to address the problems gradually during the summer. What I need to proceed with writing the paper is some R (not Rmd) script to make lots of plots, with plot() rather than ggplot2(), to decide on a small minority of those plots that are candidates for the paper. This script needs to be in the rainersachs/URAP.CA repository on GitHub. For the minor paper, NASA guidelines mean we will need .R versions of everything in another repository named after and dedicated to the minor paper and frozen (apart from correcting errors) at the time of publication. So the new repository will refer only to the data we are using now.

URAP.CA will remain as a long run repository with lots of data coming in. In the long run .Rmd files may be best and could be used for URAP.CA but for the minor paper we can't use them

One problem is if the Monte Carlo for 95%CI ribbon plots takes too long. In the very similar data set of the HG pod we now can do a Monte Carl sample within a few minutes. But that seems not to be the case in the files on GitHub repository URAP.CA. Is that because the URAP.CA files fail to use standard functions in standard R packages, and use instead customized functions which have the same functionality but are not as fast? Or has the CA Monte Carlo become much faster since the last time I was able to check? Another problem is that the plotting and Monte Carlo are too intermingled. We certainly do not want to run Monte Carlo anew every time we make a plot for the paper, let alone every time I want to experiment in choosing which plots to use.

Peter had a very nice solution for that, running Monte Carlo once and then recording the outcome in .csv files for use in all plots. It had the wrong 795 entry for a Z^2/beta^2 value. I think after correction we might be able to use that method for the minor paper repository. However while writing the paper we may need more flexibility. A minor problem is that rainersachs/URAP.CA does not seem to be consistent as regards using nls() versus using nlsLM from minpack. I installed minpack.lm so I can work with either. But I still cannot get the files to run. They don't seem to fit together. Almost all of them seem to use nlsLM but in the Graphs subfolder there is a file with the non-informative name "LAPTOP-QH0F5KBF.Rhistory" that uses nls instead of nlsLM.

Peter: if you have questions before deciding whether to shut down completely for the summer let me know, preferably by both email and in URAP.CA depository. No hurry as far as I am concerned, Thanks, Ray

rainersachs commented 6 years ago

Met with Peter. As time permits he will

(a) aim for a self-contained all R (not .Rmd). all plot() (not ggplot2()) sub-repository suitable for removing from this repository and putting in a private Sachs repository for official NASA use, that requires the following restrictions, with the minor paper. The private Sachs repository will be named after the minor paper directory. The sub-repository will contain a (possibly cleaned up) R version of Andy's basic .Rmd file, with all Monte-Carlo parts commented out. Using nlsLM() from minpack.lm package is OK. The Monte-Carlo part of the private Sachs directory will consist of 2 .csv files that can be used to run either the specific 2-ion 50-50 mixture as before or the rep(1/6, 6) 6 ion mixture as before. the sub-repository will be frozen as of the publication date except that sachs will correct it if errors are found. In addition to andy's basic file the sub-depository will contain at least one example plot of a one-ion DER, and the- plot of the 2 ion mixture with I(d) and S(d), and the corresponding plot for the 6-ion mixture, and the ribbon plot for the narrow 95%CI version of the 2-ion mixture, and the broad (uncorrelated-parameter) 95%CI version of the 6-ion plot.

(b) rearrange this URAP.CA directory in any way Peter and Andy think logical, using .Rmd files as their master version. Delete lymphocytes everywhere and delete obsolescent sub-folder. I just took out the minor files sub-folder

(c) As time permits look into speeding up the Monte-Carlo by a factor of at least 10. This has lower priority than (a) and even though it is more important scientifically and for long-run use.

After the sub-repository has migrated to the new repository this URAP.CA repository should be under student control (e.g. Andy and/or Peter), not Sachs' control. In principle their should be co-owners of the repository: Sachs for long-range continuity and at least one student authorized to take all decisions about what is in URAP.C A where.

rainersachs commented 6 years ago

I forgot to add that some plot examples could be very rough or even be ggplot2 if plot is too unfamiliar. I think I could translate into plot myself without much trouble and certainly could do so with a little help.

rainersachs commented 6 years ago

URAP CA and mouse pods.pdf

rainersachs commented 6 years ago

Here is a .pdf on programming and major-paper plans for fall semester 2018

rainersachs commented 6 years ago

programming_flowchart

rainersachs commented 6 years ago

Andy and Peter. The figure above had a mistake. Here is a corrected version of URAP CA as we will set it up and the mouse HG file suite as already implemented programming_flowchart_corrected

Please figure out from the mouse pod monte carlo or the 2-ion monte carlo in the CA script on gitHub how the monte-carlo calculation can be speeded up by a very large factor, and implement that while you are going over to the new flowchart.

Please let me know what hours are convenient for you for regular meetings this semester.

thanks, ray

rainersachs commented 6 years ago
  1. rainersachs/URAP.CA 3.1. wait for edward to clean up Monte; then leave to Andy and Peter, all three own 3.2. Go over to .csv structure 3.3. Get full data from Megumi Hada on one-ion data for Si and the three Fe in the form of the excel file EllieNaSAgrant/science/exp_data_summary July 16 2018 3.4 make more plots 3.5 make sachs private github inside rainersachs organization "Hada_2019_paper"
rainersachs commented 6 years ago

The following is here to make sure we don't forget the following numbers and arguments.

The zero-dose data show a background prevalence Y_0 = 0.00071 instead of the larger value 0.0017. Details are in Dae's 2018 Radiation Research paper. Please change the script accordingly. Y_0 is so small that the change will result in negligible changes of any figure except perhaps a figure which zooms in very closely on very small doses. However it will actually strengthen the paper substantially. A main message of the paper will be: this is the best data set of all for seeing whether HZE induce NTE in tumor or tumor surrogate endpoints, a question important for the very low doses and dose rates astronauts encounter in interplanetary space. One reason the data is so suited for that purpose is that the background is so low. Since background and NTE are two competing explanations for the high prevalence observed at the lowest non-zero dose points, even a small decrease in a small Y_0 strengthens the case for NTE disproportionately.

rainersachs commented 6 years ago

Andy and Peter: Please read the above from the corrected corrected figure down to here.

I have now added to rks_DataAndInfo.R a couple of commented lines which show how to calculate some numbers we will need when updating our input .csv scripts as Hada sends more data about extra ions. These lines should also be added to the CA version of rks_DataAndInfo.R when it is written.

rainersachs commented 6 years ago

new_two_ion_no_var six_ion_old_var six_ion_new_var old_two_ion_no_var Edward just emailed me the following: "2. I spent today refactoring the rest of the Monte Carlo code in the CA script. The two-ion, no covariance matrix ribbon plot now takes exactly one minute to run and the six-ion with covariance ribbon plot takes about 2.3 minutes to run. The last figure, a six-ion no covariance ribbon plot, had incomplete plotting code so I did not refactor the Monte Carlo there, but I looked over the it closely and found that my previous approach can be easily adapted to it. I attach plots by the old and new code below for comparison. I noticed no distinguishable difference between the two-ion plots and small but perhaps inconsequential differences between the six-ion plots. The new changes are on URAP.CA." I haven't checked, but unless the starting point for the random number generator was held the same, the small changes edward mentions could be just the typical differences between any 2 monte carlo carlo calculations.

He added: "I would advise Andy and Peter to follow the coding, organization, and documentation style in the mouse repository. They can feel free to ask me for assistance if the style in the mouse code is unclear." I agree with this suggestion. I am not sure what the lower fig. above is.

rainersachs commented 6 years ago

18fa_Cucinotta_NTE_for_private_astronauts.pdf

Here is a very up to date paper that deals with almost all of the modeling issues in our calculations, uses almost all the acronyms with which most of you are by now familiar, and also gives the space-travel background for the modeling. I suggest you read the article as deep background for the projects. Among the terms that we haven't discussed or used much are Relative Biological Effectiveness "RBE" and "Quality Factor". They are not rigorously defined but even as fuzzy concepts they are quite important, so you might want to look them up.

rainersachs commented 6 years ago

Elementary_picture_of_LET.pdf Ballarini_2008_New_J._Phys._10_075008.pdf Here is a bit I wrote on LET because a couple of you asked about it. The Ballarini .pdf has, in the lower right corner of Fig. 1 and in Fig 2, visual examples of track structure models more sophisticated than the very naive straight line track structure model I used in my ,pdf

rainersachs commented 6 years ago

Thanks for your revision of Graphs.R yesterday! I commented out the Monte-Carlo parts and that gives me exactly the type of 1-ion graphs I need for diagnosis. Nice job!

One minor quibble is that there is an option to calculate an "average" of two or more IDERs. I suspect that by "average" you mean the following: Define "Peter-pointwise-additivity" as PPA = (1/2)[E_1(d)+E_2(d)], where d is a dose. But what dose? If we are talking about a mixture experiment with total mixture dose 2d then PPA is just simple effect additivity SEA. Otherwise, as far as I can see, PPA gives you a curve which indeed lies between the E_1(d) and E_2(d) curves and will often be a rough approximation to incremental effect additivity (IEA) with r=1/2 but has no relevant interpretation at all, though the curve looks much better than the SEA baseline.

More generally, "adding" curves is not a self-explanatory idea and there are many different ways "adding" can be made precise, not only PPA,. SEA,, and IEA but also many other methods. For example Berenbaum's linear isobole method for monotonic increasing curves uses the inverse functions to E_k to get an inverse function for the "added" curve and typically leads to results slightly different from IEA. I suspect our entire approach might have some simpler and more general formulation if one really worked with function spaces and perhaps with some notion of a convex region in a function space, but maybe that is just a red herring.

rainersachs commented 5 years ago

message to Peter and testing my notifications. Thanks! An Important type of graph. It caught me with pants down because no notification. I have now arranged that I be notified henceforth. I assume the figure code will work when Andy adds extra data rows? Can I now download the files in the separated folder to my sandbox and add my own graphs? Some comments and questions on the figure follow Oxygen55 pt at CA 12+?? Oxygen350 pt at dose 0?? No SEA to reduce clutter – shoot down earlier Why ribbon? Not really informative until actual mixture data. Needs error bars somehow. Can one make point area proportional to error bar length? Use separate panel with just one one-ion DER and its error bars to help reader translate from error bar length to point area. In any case make points larger choose better color contrasts, e.g. for 6 lines: black, red, green or aquamarine2, bright brown-orange-yellow, dashed black, dashed red. Corresponding points somehow identified, e.g. open for dashed lines, otherwise solid.

rainersachs commented 5 years ago

automate_aggregating_CA_data3.1.2019.pdf Hi Peter: Here are two R functions that I need before trying out a second panel for your multi-color dots figure. No hurry but I will suspend my paper writing till I get those functions. Thanks! Ray

rainersachs commented 5 years ago

Hi Peter and Kulunu! Here are dome items for tomorrow's agenda. No hurry on any of this except the 2/19 deadline for URAP summer. Peter:

  1. When subsetting the full 1-ion data base the first step should be removing the zero dose points, aggregating the zero dose integers, and computing the zero dose prevalence with its variance. That could even be done in Data.R if convenient.
  2. How is the low LET modeling and calibration going?
  3. Remind me to ask about how you control dot size on your multicolor graphs

Kulunu:

  1. Assume the paper will target PLoS Biology, not PLoS Computational Biology.
  2. See if you can suggest a few alternative titles for the paper. I decided on one possibility and we could compare.
  3. Please get a few sample PLoS biology papers for both of us; the ones you linked are for PLoS computational biology.

See you both tomorrow, I hope.

rainersachs commented 5 years ago

Hi all: Here is the first attempt at the title and abstract of the major class paper with Hada. You may want to read through it to see how your part of the project fits into the whole and/or make suggestions and corrections. Title_and_abstract_class_paper_v1.pdf

rainersachs commented 5 years ago

Hi Andy: While working on Hada's six ion mixture P-24 .xlsx (which I will send you by email because it is locked so I cannot make it into a .pdf and GitHub doesn't like .xslx) I found seven more rows for the 1-ion CSV.csv that I am preparing in my sandbox and will push to GitHub as the master 1-ion .csv as soon as I can (so Peter and Kuluhu can work on the paper ASAP). A .pdf of the under-construction 1-ion .csv is added in this comment. Please let me know ASAP if you see any obvious mistakes or duplications in the added cells. There are omissions, which I will fill in later, but I want to catch mistakes before I make further entries, e.g. extra added rows for the 4-ion mixture. If you don't have time, let me know right away. If you do have time let me know right away with an estimate of when you will have an answer (I don't need a full answer yet, just a statement that you see no obvious mistakes or have found what looks like a mistake). I am confused because I found a sheet 3 in P-24 with the extra rows and I can't see where it came from. CSV.pdf

AndyZhaoly commented 5 years ago

Hi Professor Sachs

I am busy with exams this week and I will take a look at the data on Wednesday and meet with you on Thursday at usual time. I will also help you on finishing 3-sheet xlsx after the spring break.

Sincerely

Andy

Rainer K. Sachs, Professor Emeritus notifications@github.com于2019年3月18日 周一下午2:17写道:

Hi Andy: While working on Hada's six ion mixture P-24 .xlsx (which I will send you by email because it is locked so I cannot make it into a .pdf and GitHub doesn't like .xslx) I found seven more rows for the 1-ion CSV.csv that I am preparing in my sandbox and will push to GitHub as the master 1-ion .csv as soon as I can (so Peter and Kuluhu can work on the paper ASAP). A .pdf of the under-construction 1-ion .csv is added in this comment. Please let me know ASAP if you see any obvious mistakes or duplications in the added cells. There are omissions, which I will fill in later, but I want to catch mistakes before I make further entries, e.g. extra added rows for the 4-ion mixture. If you don't have time, let me know right away. If you do have time let me know right away with an estimate of when you will have an answer (I don't need a full answer yet, just a statement that you see no obvious mistakes or have found what looks like a mistake). I am confused because I found a sheet 3 in P-24 with the extra rows and I can't see where it came from. CSV.pdf https://github.com/rainersachs/URAP.CA/files/2980431/CSV.pdf

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/rainersachs/URAP.CA/issues/3#issuecomment-474104932, or mute the thread https://github.com/notifications/unsubscribe-auth/AZBmwit7X1OwT67wz7YaBPZn6518AAcVks5vYAKFgaJpZM4Sv4At .

-- Andy Zhao • 赵力阳

Bachelor of Arts Double Major in Statistics and Applied Mathematics University of California, Berkeley, Class of 2019

rainersachs commented 5 years ago

Hi! Hada has accepted our plan in principle. Here is her email: 3/25/2019 Hi Ray, It is great that you have been completing 82-6 data. J-218, F-0-i and F-5-i are cells I pretreated with inhibitor. Please omit these data. Situation of our team’s publication is 1) Ianik’s paper with 4 single beams (no shielding) : submitted to RRS on September, got reviewers comments on January, rewrite to answer the reviewers comments and resubmitted in February. 2) Tony’s paper with 2 beam (shielding) : Submitted to RRS on December, got reviewers comments recently, Tony is working on to revise. Please do not wait, go ahead to start preparing the paper. Attached the PDF file of previous presentation. In page 3, list of 82-6 data available. If you need any of those, let me know.

I will need to double check the .pdf Hada refers to above to see that we have all that mixture and related 1-ion data. I will do that this week. Here is the .pdf itself. GCR Consortium #17 Hada_mainly_mixed_beam.pdf Both CA and HG pods are now urged and equipped by experimentalist colleagues to go full speed ahead with paper writing in accordance with the suggestions we made to them.

rainersachs commented 5 years ago

Hi pod! Hada has now given us full leeway on the big paper. We can write whatever we want, for any journal we want, as fast as we can, as slow as we have to, using whatever we want of any of her data. This comment concerns progress and our upcoming workflow as regards the now top priority PLoS Biology paper.

Coincidentally the mouse pod is in almost exactly the same position: On Monday Blakeley and Chang give us full authority to write a paper on 3 recent mixture experiments. They have deadlines, so I had to plan the mouse pod's work first, where I hopefully now have the pod working by itself so the only thing it needs from me is writing the paper, so I am now returning to the job of entering Hada's data.

Peter: However, while working on the mouse pod data entry yesterday, I found a protocol which I think is much better than our present CA protocols. It starts with an Excel workbook sheet that has all sorts of information in a particularly convenient form. Among other things it is so narrow I see its full width on one screen even when zooming in so close my lousy eyes can read it easily. I attach a .pdf of the nearly completed Excel sheet next so you can see (looking near the bottom) why the format seems so convenient to me. I will send you the Excel version by email so you can manipulate it in your sandbox after whatever name changes you have already implemented.

1_ion_data_temporary.pdf

By deleting lots of rows and columns the worksheet can be turned into a comma separated input file. I will email that to you. Please check if it can be used with our R suite, as is or with minor changes. If so we should continue this approach or an approach which shares its main virtues. If not we have your most recent approach as a fallback option. The new protocol is slightly less automated than your protocol but is close enough to being automatic that we can implement it easily, since we will soon have entered most of the 1-ion rows we will ever use.

Andy, Kuluhu, Peter: Going forward the main assignments are: I will do most of the writing, with some help from Kuluhu. Hopefully you guys can do most of the rest because writing two papers (first drafts) before July will take me almost full time. Peter and Kuluhu should help me with figures, tables, and writing the paper. Andy should try to double check my calculations and data entry once I have finalized the temporary Excel file whose .pdf is above.

As always your core classes should have definite priority but the more you can help the better. Thanks! Ray

rainersachs commented 5 years ago

Attached here is a .pdf that contains about 50 extra data points pulled together from various sources, and, at the bottom, about 25 comments on on different confounding factors. Probably about 25 rows will eventually change as we add, delete, and correct during the rest of this semester. However the next step is to see if all this new data breaks the nls( ) or other chunks of the code. I made a 1-ion input .csv from the information in the attached ,pdf, but could not even get the main database to read the file. Until Peter and I can straighten that our in emails everything else needs to stay on hold, so I do not yet have suggested assignments for Andy and Kululu. In principle having all that extra data is great: we will be able to write a definitive paper. But I was amazed to find how many different ways one can make a mistake during data entry.

Here is the informational .pdf. The input .csv may have to be changed so often I will for the time being write emails to Peter about it with copies to Andy and Kululu

1_ion_data_info_4.7.2019.pdf

rainersachs commented 5 years ago

Here is an update of the "info" .pdf of the previous comment. Such updates will continue sporadically until Hada has answered all our questions and we have double checked our data base.

1_ion_data_info_4.10.2019.pdf

rainersachs commented 5 years ago

Hi Peter: here are minutes of a phone meeting Edward (edwardgh@berkeley.com GitHub user name eghuang) and I just had. In a week or so I will explain to you some of the background for the proposed interactions between you and the mouse HG pod mentioned in those minutes. Don't worry about the items in the minutes until you are ready to consider dealing with them -- no hurry. Minutes of 6.24.2019.RKS_EGH_phone.pdf