Closed ctmay4 closed 8 months ago
Adding @garybeverungen
I made a first pass at implementing the keyword screening algorithm based on your notes. Let me know what you think.
FYI, I added a dependency for opencsv so I could use CSVReader to read the internal keyword file (just like in SEER*DMS). But there is a warning in the POM about it. Let me know if I need to remove that.
I'll take a look tomorrow. Please create a pull request.
Also, opencsv brings a bunch of other commons dependencies. I've actually switched to fastcsv as my CSV library, but in this case I'd prefer to use no library at all. I think the easiest thing would be to switch the file to tab-separated and then you can rewmove the quoting. At that point just split on tab and no library is needed.
@depryf commented to me he doesn't like tabs. He suggested using a pipe. We just need to make sure none of the keywords contain pipes now. I did a quick check and it appears that none do.
We need to move the logic from SEER*DMS to this library. A few things to consider:
As far as design the workflow will go
initialization
screen specific text
the result will contain the following:
Note the
Keyword
class will include the keyword and the start/end position of the match as well asignored
.As for the keywords themselves, I ran this query:
and got the following. Do we need to keep the "Other" group? Do they factor into reportability?
We will discuss in the meeting on Thursday.