hltfbk / Excitement-Transduction-Layer

1 stars 1 forks source link

Multithreaded #333

Open mmatthias opened 9 years ago

mmatthias commented 9 years ago

I think it is no big deal to make the resources thread safe. It would be an advantage to initialize some classes only once.

These classes are thread safe:

KeywordBasedFragmentAnnotator
new SentenceAsFragmentAnnotator

They only have one final variable and that one is thread safe more information here http://www.javamex.com/tutorials/synchronization_final.shtml

These classes are also thread safe but could be implemented in a better way:

TokenAsFragmentAnnotator DependencyAsFragmentAnnotator

tokenPOSFilter, dependencyTypeFilter, governorPOSFilter, dependentPOSFilter - These variables could be final because the access is only a read. These methods are used by the maps "contain" and "isEmpty".

These classes have to be modified:

CachedLAPAccess

This class is not thread safe. Main modification should be done by converting the Hashmap into an ConcurrentHashMap. There are some other variables that maybe has to be discussed if they have to be uniq for each call. Are the stored information unique for a fragment graph? Is it good to have a global CachedLAPAccess or should we have global Lap implementations and use CachedLAPAccess only in local context?

gilnoh commented 9 years ago

(On the issue of CachedLAPAccess) Yep. the class is not thread safe. I will work on it to make thread safe, via

I will add more (detailed) comments later, but for now, I believe that main gain of cached LAP comes from detecting / saving common calls and not-calling underlying tool (so saves time). This is possible because we only keep one instance (global). If we move to "local" one, then I would say we should use normal (non-cached) LAP. Which will simply use less memory. CachedLAPAccess saves time by using up more memory (storing CASes) ... (I am assuming that the same sentence can appear in other fragments, too.)

I will check the class in terms of thread-safety, and then get back to you again.

mmatthias commented 9 years ago

I do not know if this is the right way to solve this. I think the better way is to have a local cached one because our webserver runs without any shout down for month. If we cache the growing dataset in a global collection than we could run some day out of memory. ... As we only need to build the whole graph in one call I think it is enough to have the cache local. Would it be possible to have the implementations of the laps global and use the cache in a local way? ... I think we could modify the classes in this way. I could make the adaption with Kathrin when she is next time in our office.

gilnoh commented 9 years ago

Oh, now I see more clearly about what you mean by "global" and "local". I completely agree.

CachedLAP is only required when you build a graph; and once the graph is stable, you can throw out the CachedLAP instance (and all its memory). When you next requires to expand the graph, only then you need another (a new, clean) instance of CachedLAP. So in this sense, you should only activate one CachedLAP instance when you build / extend entailment trees.

I will make cached LAP more thread-safe and get back to this comment once more...

salvadora commented 9 years ago

I added the final keyword in the fragment annotators as requested.

gilnoh commented 9 years ago

I've updated CachedLAP as thread-safe as much as I can see. #336

Underlying LAPAccess is not thread safe (which would not be called much, since we are using cache), and I've surrounded it with synchornized method; and I've updated all instance variables to be accessed in a way that is thread safe.

I do not know the context of thread-safe attempt --- but just let me outline that, many other codes (as I see) of TL is not thread-safe. For example, codes like Entailment graph generations are not thread safe, and I guess it would be very hard to make all of them thread-safe (unless you do simply blocking such as wrap everything in sync, etc).

So, :-) I am assuming that you are trying to make codes that are related to USE-case 2 "usage" of the already built graph --- but not on building parts. Is this correct? :-) Just wanted to know this is the case.

Anyhow, CachedLAPAccess would be now thread-safe. --- eh, or so I believe.