plazi / arcadia-project

2 stars 1 forks source link

service: automatic upload and processing of PDFs from Zenodo to TB #134

Open punkish opened 4 years ago

punkish commented 4 years ago

Please see the conceptual diagram below for automated upload and processing of PDFs from Zenodo to TB. When a user uploads a PDF to Zenodo, a trigger fires causing the PDF to be sent to TB where it is processed, and the extracted treatments, figures, tables, etc. are sent back to Zenodo, stored, and linked to the PDF.

@slint, would it be possible to create this trigger in the next few days? (as @gsautter mentions, you would need to provide him a place to register his callback). Getting this going now means, @gsautter can start putting the plumbing into place to get it all working. Then, when we are all in Genève in February, we will be able to fully test it all together, face-to-face.

Please let us know if you need something from us to make this happen, and keep us updated on when it is working, even if in a draft-mode. Many thanks.

                        ┌─────────────┐                          
                        │treatments   │                          
        ┌───────────────│figures      │◀─────────────────┐       
        │               │tables       │                  │       
        │               └─────────────┘                  │       
        │                                                │       
        │                                                │       
        ▼                                                │       
┌───────────────┐                                        │       
│┌─────────────┐│     ┌─────────┬ ─ ─ ─ ─ ─ ┐            │       
││   Zenodo    ││     │ trigger │                        │       
│└─────────────┘│     ├─────────┘           │     ┌─────────────┐
│┌─────────────┐├────▶  ┌───────────┐             │┌───────────┐│
││             ││     │ │ PDF store │       │     ││ Frankfurt ││
││             ││       └─────┬─────┘             │└───────────┘│
││             ││     │       │             │────▶│┌───────────┐│
││             ││       ┌─────▼─────┐             ││    GGI    ││
││             ││     │ │  create   │       │     ││ processes ││
││             ││       │  record   │             ││    PDF    ││
││             ││     │ └───────────┘       │     │└───────────┘│
│└─────────────┘│                                 └─────────────┘
│┌─────────────┐│     └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘                    
││ upload PDF  ││                                                
│└─────────────┘│                                                
└───────────────┘                                                
             ▲             
             │             
─────────────┴─────────────
            .─.            
           (   )  │        
            `─'   │        
        ┌────│────┘        
   .─.  │    │       .─.   
  (   ) │ ┌─────┐   (   )  
 ┌─────┐  │     │  ┌─────┐ 
 │     │  └─┬─┬─┘  │     │ 
 └─────┘    │ │    └─────┘ 
         ───┘ └───         

           users           
gsautter commented 4 years ago

I like the setup, and the diagram pretty much nails what I have in mind. Frankfurt has a generic endpoint for callbacks (even though there is nothing listening right now), so what we'll need is a place in Zenodo to register such callbacks to happen when new PDFs are uploaded, and then I can start building the handling on the Frankfurt end.

Just curious, though, why is the PDF upload only for giraffe people?