Open isoboroff opened 2 years ago
@isoboroff when running the web services like so:
patapsco-web --run path/to/run --port 9090
It reads the configuration file saved in the run directory and uses the topic file section to grab the language of the queries (and uses the retrieve config for those parameters).
I think you're asking for the ability to override parts of the config on the command line. Is that right?
My main question is how are the queries parsed. The answer seems to be the same way they are in batch mode. I think that's just word tokens with no operators or anything, right?
I'm adapting my collection search tool, which currently uses ElasticSearch, to use the Patapsco web service, on the hypothesis that it is better at tokenizing the languages I'm working with (Russian, Farsi, Chinese). Elastic has a lot of web service functionality like highlights and faceting and pagination which are nice when building an interactive search tool, and also it's not hard to use Lucene query syntax which supports some common operators.
Just adding the minimum configuration:
topics:
input:
lang: fas
retrieve:
name: bm25
number: 10
There is an error:
patapsco.error.ConfigError: 3 validation errors in configuration
topics.input.format - missing field
topics.input.source - missing field
topics.input.path - missing field
These fields of course don't make sense for interactive queries. Does it mean that the query endpoint is expecting a JSON object like a batch query?
(edited: removed bad stand-in config. I needed a basic "queries" section which was missing.)
This is my javascript code
// Searching using Patapsco
var lang = targetLanguage;
lang = 'zho' // FIX ME remove for release
var url = PATAPSCO_URL + '/' + lang ;
const myRequest = new Request(url+'/query/'+inputQuery);
fetch(myRequest)
.then(response => {
console.log('Response:', response.status);
if (!response.ok) {
throw new Error('Network response was not OK');
}
return response.json();
})
.then(data => {
console.log("Patapsco response");
console.log(data);
var results = data['results'];
if (data.query && data.query.text) {
document.getElementById('target-query').dataset.recent =
data.query.text; } console.log(results); for (let i in results) { let id = results[i]['doc_id'] var doc_num = parseInt(i) + 1; let doc_info = [doc_num.toString(), id]; document_list.push(doc_info); } console.log(document_list);
possible_queries[inputQuery] = [inputQuery, document_list];
console.log(possible_queries);
buildDocumentList(document_list);
})
.catch(error => {
document.getElementById('inner-hit-list').classList.remove('no-display');
const findContainer =
document.getElementById('find-document-container'); findContainer.innerHTML = ' There was a error issuing the query...try again'; console.error('There has been a problem with your fetch operation:', error); }); }
Are you getting the error when setting up the web service?
On Mon, Mar 28, 2022 at 10:12 AM Ian Soboroff @.***> wrote:
Just adding the minimum configuration:
topics: input: lang: fas retrieve: name: bm25 number: 10
There is an error:
patapsco.error.ConfigError: 3 validation errors in configuration topics.input.format - missing field topics.input.source - missing field topics.input.path - missing field
These fields of course don't make sense for interactive queries. Does it mean that the query endpoint is expecting a JSON object like a batch query?
— Reply to this email directly, view it on GitHub https://github.com/hltcoe/patapsco/issues/38#issuecomment-1080705796, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABJNDOQNOIPZY3V73WSFFWDVCG45PANCNFSM5RUGTUXQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>
--
Dawn J. Lawrie Ph.D. Senior Research Scientist Human Language Technology Center of Excellence Johns Hopkins University 810 Wyman Park Drive Baltimore, MD 21211 @.*** https://hltcoe.jhu.edu/faculty/dawn-lawrie/
Frankly, I'm trying run the web service and send some queries from the command line so I can understand the request and response formats.
Your JS doesn't clarify the format of the query, and you appear to have a custom URL maybe meaning you have a proxy layer in there per language, or your own web service app.
I see in patapsco/topic.py
that there seem to be hooks for Lucene query processing, I'll start poking through that.
@isoboroff Yes, processing of queries/topics in the web services is controlled by the configuration file used to create the index. Most people use term-based queries or PSQ. I added support for Lucene syntax but it has to be configured for that and is not interoperable with PSQ. The only documentation that I have on this is here: https://github.com/hltcoe/patapsco/blob/master/docs/config.md#lucene-classic-query-parsing
Hi @dlawrie your js code looks so subtle and concise, could you share your js code project for beginners as me? Thanks!
https://github.com/hltcoe/patapsco/issues/38#issuecomment-1080723547
I tested in a web browser by just typing the URL with the query at the end.
On Mon, Mar 28, 2022 at 10:27 AM Ian Soboroff @.***> wrote:
Frankly, I'm trying run the web service and send some queries from the command line so I can understand the request and response formats.
— Reply to this email directly, view it on GitHub https://github.com/hltcoe/patapsco/issues/38#issuecomment-1080726006, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABJNDOQURK6RVASM2VL2NODVCG6WTANCNFSM5RUGTUXQ . You are receiving this because you commented.Message ID: @.***>
--
Dawn J. Lawrie Ph.D. Senior Research Scientist Human Language Technology Center of Excellence Johns Hopkins University 810 Wyman Park Drive Baltimore, MD 21211 @.*** https://hltcoe.jhu.edu/faculty/dawn-lawrie/
The plain text query is parsed in the same way the documents were parsed (ie. normalized, stemmed or not, etc). Does that answer the question?
On Fri, Mar 25, 2022 at 9:48 AM Ian Soboroff @.***> wrote:
How does Patapsco parse queries? In particular, when you send a query to the web service, is it parsed as a Lucene query, or something else?
— Reply to this email directly, view it on GitHub https://github.com/hltcoe/patapsco/issues/38, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABJNDOVN63BLBFDACIDYLBTVBW7Y7ANCNFSM5RUGTUXQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>
--
Dawn J. Lawrie Ph.D. Senior Research Scientist Human Language Technology Center of Excellence Johns Hopkins University 810 Wyman Park Drive Baltimore, MD 21211 @.*** https://hltcoe.jhu.edu/faculty/dawn-lawrie/
How does Patapsco parse queries? In particular, when you send a query to the web service, is it parsed as a Lucene query, or something else?
The context is that I'm thinking about ways to handle queries on a combined traditional and simplified Chinese corpus.
Are parameters of the retrieval in the web service controlled by the "queries" and "retrieve" clauses of the config file?