tmills / ctakes-docker

Apache License 2.0
23 stars 18 forks source link

RELATED: Continuously ingesting clinical documents #19

Open MatthewVita opened 6 years ago

MatthewVita commented 6 years ago

Hi Sean, Tim, cTAKES Community,

How do I make the cTAKES Collection Reader continuously "listen" for more clinical documents to process?

Some notes on how I am trying this:

Thanks,

Matthew Vita www.matthewvita.com

tmills commented 6 years ago

I think for this to work you'd need to change the collection reader you're using. The xml descriptor you're calling (FilesInDirectoryCollectionReader.xml) just calls a java class called FilesInDirectoryCollectionReader that runs ls to get a list of files and creates a CAS for each one. If you want one that sits and waits you would have to modify it (or create a new one) that sits and waits.

The alternative is that it's perfectly ok to have more than one collection reader, or to run multiple collection readers serially. In the simplest case you could just have a script that runs once a day to start up the command to call the collection reader. You would want to make sure it doesn't re-process notes multiple times somehow. This requires the pipeline and broker container(s) to be "on" all the time but if they're just waiting they probably don't consume much resources.

Hope this is helpful.

MatthewVita commented 6 years ago

Hi @tmills,

I started looking into your suggested approach. I'm thinking that editing the src/main/java/org/apache/ctakes/core/cr/FilesInDirectionaryCollectionReader.java file is best.

It looks like the cTAKES download into the container is a binary release... do you know how I can copy my new FilesInDirectionaryCollectionReader class in and be used or will have I to create my own fork/binary version of cTAKES to accomplish this?

Thanks, Matthew