jgallowa07 commented 1 year ago

BEER -> phip-flow

@sminot.

BEER has presented a very nice methodology for analyzing significant binding events. This method and software is described here . The documentation for the R interface can be found here. We would like to integrate this analysis into the phip-flow pipeline. Here's an outline about how I imagine this happening:

The process location in DAG A quick and easy way to go about this is simply extend the DAG like the AGG sub workflow does. From here, we can pretty easily get all the inputs we need for the BEER analysis, i.e. counts for beads and empirical IP's. If we extend from the binary, we can use the python API utilities to easy isolate the inputs for BEER as well. I'm agnostic to whether or not we use the Python API to do this.

Dependencies Looks like we will need [PhIPData(https://www.bioconductor.org/packages/release/bioc/vignettes/PhIPData/inst/doc/PhIPData.html), and BEER R bioconductor packages. I'm very unfamiliar with R packaging, but I imagine extending this image to include the R dependencies should not be too bad.

Transformation The PhIPData Object is a near identical data structure to the xarray phipdata we use in phippery (hilarious how similar their organization of the data is to ours) Theirs: Screenshot from 2023-02-27 10-12-26 Our xarray: So this shouldn't be hard. Essentially we need to read in the sample, peptide, and peptide counts table in an R script in able to run BEER. I think this simply requires readr matrix objects from the wide output of phippery. I'm very bad with R, but it also seems that we may need to specify data types in each of the sample and peptide annotation tables when their read in.

Output Let's just stick with outputting whatever is easiest from the R pipeline for now? I imagine we could tie things back into the xarray dataset pretty easily, but this is not necessary for the manuscript re-submission.

sminot commented 1 year ago

Thanks for putting this together, @jgallowa07.

I suspect that we will find it easier to use the pre-built images for phipdata (https://quay.io/repository/biocontainers/bioconductor-phipdata?tab=tags) and that there's an appreciable chance that it would be harder to add phipdata to the phip-flow docker image. That's just based on my experience working with installing R dependencies in Docker in the past.

I'm not absolutely horrible with R, although it's not my strong suit. I'm happy to take the first crack at this if you like. Otherwise, if you want to make the first attempt and have me fill in the details that would be fine too. Just let me know your preference.

jgallowa07 commented 1 year ago

Oh, didn't catch the pre-built image. Great!

I'll let you take the lead, and come in whenever I can be most helpful. Thank you!

sminot commented 1 year ago

I'd love to add the BEER results upstream of the AGG subworkflow, so that those results can be wrapped into the reports that Terry is looking at. How hard do you think it would be to take the results from BEER and add them to the xarray?

Do you think that's something I could pass back to you once I've got BEER working, @jgallowa07?

jgallowa07 commented 1 year ago

@sminot Yes, reading data back into the xarray dataset and merging should be easy - and I'm happy to do that once you have some BEER results to confirm.

In this case the process rule would be below (input from) replicate_counts in parallel to CPM, and size_factors in the STATS workflow. We could then pretty easily merge the results back in to sit nicely with everything else.

sminot commented 1 year ago

I just pushed a commit to the add_beer branch. I'm able to get the data set up as a PhIPData object. However, when I run the first step in the BEER tutorial (running edgeR) I get an error message:

Stop worker failed with the error: wrong args for environment subassignment
Error: BiocParallel errors
  1 remote errors, element index: 1
  11 unevaluated and other errors
  first remote error:
Error in edgeR::exactTest(data): dispersion is NA
In addition: Warning message:
In estimateDisp.default(y = y$counts, design = design, group = group,  :
  There is no replication, setting dispersion to NA.

I also tried running BEER directly on the PhIPData object without first running edgeR, but I got a different error message. When I looked in the BEER source code to debug that error, I found that it was looking for the edgeR outputs.

@jgallowa07, do you think that it's possible that edgeR won't run on datasets which have only a single beads-only control, as this test dataset does? It might be worth digging in a bit to see if that is true, in which case we might need to make this conditional on >1 beads-only control. I just wanted to check and see what you thought before I started to go down that road.

sminot commented 1 year ago

I suppose I'll just punt on that question for now and use a test dataset for development which has >1 beads-only control

jgallowa07 commented 1 year ago

Sorry - I’m out this week. I’ll update the test dataset this upcoming week to include more than one beads only. I’m not sure if BEER requires more than one mock IP but I imagine that’s the case given the approach. And indeed if so yes a simple requirement would be the route to take IMO. I’ll look more closely early next week as well. Thanks so much, Sam!

sminot commented 1 year ago

In my local test run of the add_beer branch, I'm getting this error from BEER itself:

── Running JAGS ────────────────────────────────────────────────────────────────
Sample runs
Error in reducer$value.cache[[as.character(idx)]] <- values : 
  wrong args for environment subassignment
Calls: brew ... .bploop_impl -> .collect_result -> .reducer_add -> .reducer_add
In addition: Warning message:
In parallel::mccollect(wait = FALSE, timeout = 1) :
  1 parallel job did not deliver a result

This is the point in the project when we have to think long and hard about whether it's worth it to dig into this error and figure out whether it can be fixed and how. When you get a chance, if you could run your test data @jgallowa07 and see if you get a similar error, that would be greatly appreciated. It could be that this tool just isn't robust enough to adopt into our larger workflow.

jgallowa07 commented 1 year ago

@sminot - I've run your branch on a new subset of the data that included two beads_only samples, and believe it ran to completion?

(base) quokka 40/aaf64e46b1cd03b9ff84f44476efdb » tail -n 15 .command.log

[1] "Building PhIPData object"
! Missing peptide start and end position information. Replacing missing values with 0.
[1] "Running edgeR"
[1] "Adding edgeR hits"
[1] "Setting up BEER"
[1] "Running BEER::brew"

── Running JAGS ────────────────────────────────────────────────────────────────
Sample runs

── Summarizing results ─────────────────────────────────────────────────────────
[1] "Getting matrix of peptides that were run"
[1] "Identifying super-enriched peptides"
[1] "Rerunning BEER"

and

(base) quokka 40/aaf64e46b1cd03b9ff84f44476efdb » tail -n 15 .command.err 

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union

! Missing peptide start and end position information. Replacing missing values with 0.

── Running JAGS ────────────────────────────────────────────────────────────────
Sample runs

── Summarizing results ─────────────────────────────────────────────────────────

The exit code was 140, which seems as if it may have something to do with exceeding resource allocations .. but maybe it's simply that the script doesn't write the output file yet?

sminot commented 1 year ago

Oh, that's great!

Can you get me oriented to your test data setup? I'll switch to using that. And maybe we'll just make the workflow robust to the error case and have it omit the BEER results if they can't be made. But at least if I can use this test data then I can get the workflow set up correctly for the best case scenario.

jgallowa07 commented 1 year ago

I randomly sampled 2 library controls, 2 beads_only controls, 2 Healthy, and 2 covid-positive samples from the same Pan-CoV dataset described here. The full reads files were run to give me the results posted above. I took the first 12000 lines (3000 reads) of each of these and pushed them here. I'm running now on the cluster, as before -- and no failure yet. But it's been running for a couple hours. I'm assuming my previous run from above timed out given the 140.

jgallowa07 commented 1 year ago

quick update: The test set pushed to add_beer branch reproduced the results above. The pipeline was run from our group node, quokka, submitting to the gizmos like so:

#!/bin/bash

set -e
source /app/lmod/lmod/init/profile

module load Nextflow/22.04.0
module load Singularity
export PATH=$SINGULARITYROOT/bin/:$PATH

/usr/bin/time nextflow  main.nf \
    --sample_table data/pan-cov-example-with-beads/sample_table.csv \
    --peptide_table data/pan-cov-example-with-beads/peptide_table.csv \
    -with-report output_local/nextflow_report.html \
    -profile cluster

sminot commented 1 year ago

When you say "reproduced the results", which results are you referring to? I'm getting the run through on my laptop to work out a failure exception catching approach.

jgallowa07 commented 1 year ago

Sorry I just mean the same log output up to “——summarizing results———“ before the 140 exit code

sminot commented 1 year ago

Well this is odd, I got the following error from your test dataset:

  Error in reducer$value.cache[[as.character(idx)]] <- values :
    wrong args for environment subassignment
  Calls: brew ... .bploop_impl -> .collect_result -> .reducer_add -> .reducer_add
  In addition: Warning message:
  In parallel::mccollect(wait = FALSE, timeout = 1) :
    1 parallel job did not deliver a result
  Execution halted
  Error while shutting down parallel: unable to terminate some child processes

I think I'll try running it on the cluster next

jgallowa07 commented 1 year ago

Huh. Likewise, I’ll try running locally.

sminot commented 1 year ago

@jgallowa07, what parameters are you using for this test dataset?

jgallowa07 commented 1 year ago

All defaults except pointing to the new sample/peptide table.

jgallowa07 commented 1 year ago

Hey @sminot, late follow up as we're finalizing a re-submission of the ms.

I took another swing at running beer locally on a test set (without library samples) for a little longer and it looks like it was successful after about 2 1/2 hours (!) I pushed the test dataset to the add_beer branch.

For reference, I ran the current bleeding edge of add_beer like so:

nextflow run main.nf \
    -profile docker \
    --sample_table data/pan-cov-example-with-beads-no-lib/sample_table.csv \
    --peptide_table data/pan-cov-example-with-beads-no-lib/peptide_table.csv \
    --output_tall_csv true \
    --output_wide_csv true \
    --results $(date -I)-test

Note that the profile for batch submission (-profile cluster) defaults the time limit to 1hr so that would need to be changed if running there.

The stdout for run_beer looked like:

[1] "Building PhIPData object"
[1] "Running edgeR"
[1] "Adding edgeR hits"
[1] "Setting up BEER"
[1] "Running BEER::brew"
[1] "Getting matrix of peptides that were run"
[1] "Identifying super-enriched peptides"
[1] "Rerunning BEER"

And the nf error being:

Error executing process > 'STATS:beer_enrichment:run_beer (1)'

Caused by:
  Missing output file(s) `beer_results.tsv` expected by process `STATS:beer_enrichment:run_beer (1)`

Command executed:

  run_beer.Rscript

Obviously because there's no code for writing that tsv currently. So this looks good (although I haven't looked at any of the actual results)!

While this analysis takes a long time to run (which they note in the BEER paper), I suppose it's probably worth finishing the integration here? I'm happy to spend a little time to do this (shouldn't take long) but wanted to see if y'all had any thoughts before moving ahead (cc @matsen @ksung25).

Sam, AFAICT this would include:

In run_beer add the "assay" layers to the phip_obj and write that to binary using saveRDS() (optionally?).
Write those same results to individual csv/tsv's, and then add them to the phippery xarray object used as input here using phippery's python API.
make that updated xarray object the output of the beer_enrichment workflow.

After that I think the pipeline should handle merging it with the rest of the stats and outputting tall / wide / pickle xarray with all the EdgeR/BEER results tied in.

sminot commented 1 year ago

Hey @jgallowa07, I'm really glad to hear that you were able to make it further through this process. I've made a clean testing environment with your updated add_beer branch and am testing it out locally as well.

I like the idea of continuing the integration, and your steps seem reasonable.

The one piece of architecture that I'd like to make sure we have a plan for is robust error handling. Based on our experience with BEER up until this point, I don't think I have a good idea of when it will or will not complete successfully. To make the user experience more streamlined, I would advocate for a setup where a failure at the BEER step is handled gracefully and the rest of the workflow can just proceed without it.

That sort of conditional functionality is not incredibly hard to implement in Nextflow, but I thought we should talk over that high-level goal before digging into the implementation.

What do you think about making BEER a nice-to-have but not required output from the pipeline?

jgallowa07 commented 1 year ago

Awesome - thanks for the sanity check :crossed_fingers: ! And absolutely agree no requirement of success needed. Off the top of the head, I think we could do one or more of the following to achieve this behavior.

Use errorStrategy 'ignore' - and maybe point the user to the working directory for that process in case they want to investigate further? Or maybe we just let NF do that for them? you probably know what's best here.
I've found that running edgeR is fast, and apparently produces good results on it's own, so maybe we should separate that from the beer run process? In the paper they note

Since edgeR does not rely on MCMC procedures and thus generates inference faster than BEER, we further investigated the performance of edgeR analyzing PhIP-Seq data in a two-group comparison with the mock IP samples in one group and a single serum sample in the other group (assuming equality of group variances) and found that BEER and edgeR are equally effective in detecting moderately and strongly reactive peptides, but BEER is needed to detect weakly reactive peptides

more than one "beads_only" sample is required just based on your initial experience above. While I cannot find any specific recommendation on the number of mock-ip's, the following manuscript blurb leads me to believe that range is 4-8 per plate

PhIP-Seq experiments require the use of negative controls (so-called ‘beads-only’ or ‘mock immunoprecipitations’ lacking antibody input), which are included as 4–8 wells of a 96-well plate, and are compared to the read counts from each individual serum sample on the plate.

sminot commented 1 year ago

I like the idea of running edgeR independently and outputting those results regardless of whether BEER finishes.

Using errorStrategy 'ignore' is definitely the right approach, with two considerations that come up for me:

We should probably retry errors which suggest that the task was killed by the job scheduler (out of memory errors, for example) and only ignore the other errors. This can be done with something like errorStrategy { task.exitStatus in 137..140 ? 'retry' : 'ignore' }. Obviously this works best with autoscaling resources (e.g. memory { 8.GB * task.attempt } and cpus { 4 * task.attempt })
We need to be careful that ignoring the BEER error does not have downstream effects. In particular when a downstream process is expecting output from a task which fails and is ignored, then the downstream process may not execute. This can be accounted for by e.g. using .collect() on the upstream channel and making the consuming process deal explicitly with the empty inputs case. Looking at the current code, I think this should be already handled pretty well. We just need to make sure that the merge_binary_datasets task works equally well with and without the BEER results being created upstream.

Another question -- how much of this were you planning on tackling vs. needing my help on? I'm invested in helping you get this wrapped up, but I also don't need to take anything away from you that you were already planning to do.

jgallowa07 commented 1 year ago

Amazing, I hadn't thought about either of those!

Another question -- how much of this were you planning on tackling vs. needing my help on?

Thanks Sam :) I am quite busy atm so any amount that you can tackle would be so so helpful. I'll probably have some time by Tuesday - but the branch is all yours until then if you have time.

jgallowa07 commented 1 year ago

Another thought: while I simply removed the library samples from the test dataset, we should probably avoid running either edgeR or BEER on those "ctrl' samples when they are indeed present in the dataset?

sminot commented 1 year ago

Here's where I get worried -- my 'sanity check' running of the clean add_beer branch using the command you ran resulted in an error which did not produce any output inside the BEER task.

Since everything is working inside Docker and the inputs are the same, I'm at a loss to explain why this would work on your computer but not on mine.

An idle thought is that perhaps you have more resources allocated to Docker on your computer, and my failure may actually be an autokill from the Docker daemon which is not being reported out due to the mclapply wrapper. Here's my Docker resource allocation, in case yours is quite different:

Logs:

Digest: sha256:689897f1d177b6bb214cf71870ca064b85b6e53680b0b1525eab1e9cbc498129
Status: Downloaded newer image for quay.io/biocontainers/bioconductor-beer:1.2.0--r42hdfd78af_0
Loading required package: PhIPData
Loading required package: SummarizedExperiment
Loading required package: MatrixGenerics
Loading required package: matrixStats

Attaching package: ‘MatrixGenerics’

The following objects are masked from ‘package:matrixStats’:

    colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
    colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
    colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
    colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
    colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
    colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
    colWeightedMeans, colWeightedMedians, colWeightedSds,
    colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
    rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
    rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
    rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
    rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
    rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
    rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
    rowWeightedSds, rowWeightedVars

Loading required package: GenomicRanges
Loading required package: stats4
Loading required package: BiocGenerics

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:stats’:

    IQR, mad, sd, var, xtabs

The following objects are masked from ‘package:base’:

    anyDuplicated, aperm, append, as.data.frame, basename, cbind,
    colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
    get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
    Position, rank, rbind, Reduce, rownames, sapply, setdiff, sort,
    table, tapply, union, unique, unsplit, which.max, which.min

Loading required package: S4Vectors

Attaching package: ‘S4Vectors’

The following objects are masked from ‘package:base’:

    expand.grid, I, unname

Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Attaching package: ‘Biobase’

The following object is masked from ‘package:MatrixGenerics’:

    rowMedians

The following objects are masked from ‘package:matrixStats’:

    anyMissing, rowMedians

Loading required package: rjags
Loading required package: coda
Linked to JAGS 4.3.0
Loaded modules: basemod,bugs

Attaching package: ‘dplyr’

The following object is masked from ‘package:Biobase’:

    combine

The following objects are masked from ‘package:GenomicRanges’:

    intersect, setdiff, union

The following object is masked from ‘package:GenomeInfoDb’:

    intersect

The following objects are masked from ‘package:IRanges’:

    collapse, desc, intersect, setdiff, slice, union

The following objects are masked from ‘package:S4Vectors’:

    first, intersect, rename, setdiff, setequal, union

The following objects are masked from ‘package:BiocGenerics’:

    combine, intersect, setdiff, union

The following object is masked from ‘package:matrixStats’:

    count

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union

[1] "Building PhIPData object"
! Missing peptide start and end position information. Replacing missing values with 0.
[1] "Running edgeR"
[1] "Adding edgeR hits"
[1] "Setting up BEER"
[1] "Running BEER::brew"

── Running JAGS ────────────────────────────────────────────────────────────────
Sample runs
Error in reducer$value.cache[[as.character(idx)]] <- values : 
  wrong args for environment subassignment
Calls: brew ... .bploop_impl -> .collect_result -> .reducer_add -> .reducer_add
In addition: Warning message:
In parallel::mccollect(wait = FALSE, timeout = 1) :
  1 parallel job did not deliver a result
Execution halted

Any ideas, @jgallowa07 ?

jgallowa07 commented 1 year ago

No, unfortunately. It seems like you have plenty of resources. I only saw 2 CPU's being used and not more than a few gigs of RAM. Happy to run it with the nextflow report and send along if you're curious.

I just use docker desktop on ubuntu here's what docker info gives me:

Client:
 Context:    default
 Debug Mode: false
 Plugins:
  app: Docker App (Docker Inc., v0.9.1-beta3)
  buildx: Build with BuildKit (Docker Inc., v0.5.0-docker)

Server:
 Containers: 7
  Running: 0
  Paused: 0
  Stopped: 7
 Images: 35
 Server Version: 20.10.1
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 269548fa27e0089a8b8278fc4fc781d7f65a939b
 runc version: ff819c7e9184c13b7c2607fe6c30ae19403a7aff
 init version: de40ad0
 Security Options:
  apparmor
  seccomp
   Profile: default
 Kernel Version: 5.4.0-148-generic
 Operating System: Ubuntu 20.04.3 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 4
 Total Memory: 24.44GiB
 Name: ubuntu
 ID: HKZQ:HQYQ:4NG6:4U66:4VOB:3Q43:QWYT:4TO7:IWQT:YNUL:YTEI:ORKD
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Username: jgallowa
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: No swap limit support
WARNING: No blkio weight support
WARNING: No blkio weight_device support

Unless this tells you something obvious we should just run edge R and output the PhIPData Object for the users to run BEER themselves. That would be sufficient for the MS. Sorry for the headache!

jgallowa07 commented 1 year ago

Maybe this is relevant.

Doing brew(sim_data, BPPARAM = BiocParallel::SerialParam()) is how you force the process not to multithread which may be causing issues for you? Although my beer seems to be running on 2 cores fine.

That's all I can think of - but it seems like this is enough of pain that we should just export the R object and call it a wrap.

sminot commented 1 year ago

I'm inclined to agree. And even if we figure out how to get make my system compatible I suspect that we'd be exposing ourselves to all sorts of error reports from users coming from the BEER step. Let's export the data so that they can run BEER themselves, and then they won't submit bug reports here when they have trouble. That way we also don't have to work on keeping this workflow in sync with updates to BEER, which will likely be updated considerably if it gets traction.

jgallowa07 commented 1 year ago

Totally. Thank you for sanity checking!

jgallowa07 commented 1 year ago

Bringing this thread back to life again to address @matsen 's comments from slack

I do feel like we're leaving things in a weird state, that there is something that would seem great to incorporate but we aren't including it because we have some hiccup that we haven't even raised on their side. If it was producing bad results or had problematic behavior that would be something else. You don't think?

My response is based off of discussion above with @sminot

Even while it does run for me, it is very slow (~ 3 hrs for MCMC 4 empirical samples). IMO this is an additional hiccup that changes things a bit. The authors mention this a few times in the paper - stating that edgeR is much faster and sufficient for detecting moderate-to-high significance. In my mind, that is enough justification to include the option to run edgeR and not BEER. If it's easy for folks to load in the RDS and run BEER in their own environment then we may even save the user some headache because their environment can use resources more flexibly the the pipeline could (in the case of using containers, batch submission etc).

Conversely, if we include the option to RUN BEER we would need to

put more time into figuring out how this method scales (something the authors do not do, AFAICT) to figure out some dynamic allocation heuristic that works in the pipeline.
Add a big ol' warning "IS SLOW - potentially infeasible after a certain number of samples"
Direct folks to the BEER team for raising issues? But the issue could indeed be a pipeline/container/runtime allocation issue. Something that may be a headache for us down the road

@sminot Sorry to raise re-raise this issue (for a 3rd time? lol) but could you summarize your thoughts in response to this? Thanks All

sminot commented 1 year ago

My larger perspective on this question is that BEER represents a significant amount of technical debt, and I don't see that it clearly adds enough value to warrant taking that on.

The situations I foresee arising from the inclusion of BEER which could take up our time in the future are when a user selects the option to run BEER and:

It raises an error, prompting them to ask us to debug our workflow (even though the problem is actually with BEER);
It runs for many many hours, and they complain to us that the phip-flow pipeline is slow and impractical (even though they actually would have been happy with the results sans BEER);
It returns perplexing results which disagree with the other provided methods, and they ask us to resolve a set of minor incongruences.

Put another way, if we add BEER it will instantly become the most breakable and troublesome part of the whole pipeline. It's a recently developed piece of software, which is never a good sign. We don't have control or understanding of the source code. We aren't able to fix bugs that we find. We don't even have a good understanding of how much value it provides. Lastly, it's not like BEER is a well established standard in the field that people will expect us to support. This is not intended as a criticism of BEER specifically, just a practical skepticism of any newly built tooling.

If the goal of phip-flow is to provide a robust utility for processing PhIPseq data, then I think we'd be better served by leaving out BEER for now and reevaluating in a year or two to see if the method has been maintained and improved.

matsen commented 1 year ago

Looking forward to chatting over video, but the thing that stands out for me is that we are already running BEER for edgeR.

matsengrp / phip-flow

BEER Analysis into workflow #56

BEER -> phip-flow