Closed koustuvsinha closed 6 years ago
Verified with organizers, cannot use an api.
Update:
After installing ElasticSearch and loading Wikipedia dump, the docker size is now so big that on docker commit
I get the error no more size left. Probably the var boot on my server doesnt have enough space for this huge model (150GB+ size). Possible workaround : Download the indices after docker has been initialized on client side.
While zipping the indices I am out of disk space again 😢
Should we use this? https://github.com/facebookresearch/DrQA
that way we can take a subset of the Wiki corpus or other QA corpora to ask questions about
only 25Gb and we can train the model to retrieve the answers
oh WOW!!!!! This makes life so easy!!
Ok, I am not so sure about DrQA's performance now. Our initial understanding was if an user asks a question, we extract the entity and search the wiki dump to get a one liner. This is what I get after few iterations:
wait, what do you mean after a few iterations? Like epochs?
^ no not epochs, this was a pretrained model. By iterations I meant number of times I tested 😛
Maybe can try increasing the top-X https://github.com/facebookresearch/DrQA/blob/master/scripts/pipeline/interactive.py
especially the n-docs
well we could include this model as it is good sometimes in fetching the correct answer, but the only problem is the processing time is quite a bit. We could do one thing, first assess if the question is related to the document (entity overlap), then use the vanilla DrQA to answer that. If not, we could send a request to this model to generate the answer, and later in the process send the response to the user like "Btw, you asked about blah, I think the answer is blah"
Extract facts from Wikipedia using Elasticsearch api
Implementation ideas:
The wikipedia dump can be found here. Although, given the huge size of this dump, I don't know whether it would be feasible for us to wrap this within our docker container. Easier way to tackle this problem would be to use the python wikipedia api, but as per the convai rules we are not allowed to place external api calls.