simonw / datasette-ripgrep

Web interface for searching your code using ripgrep, built as a Datasette plugin
https://ripgrep.datasette.io
Apache License 2.0
72 stars 1 forks source link

Terminate process early once the desired number of results have been returned, plus set time limit #3

Closed amitu closed 3 years ago

amitu commented 3 years ago

You stop reading after max_lines, but you let the process to run and wait for it to finish.

I haven't run it, or Datasette for that matter, just was curious, and thought I would get this clarified.

simonw commented 3 years ago

Yes, definitely. I'm still trying to figure out the best way to manage the process.

simonw commented 3 years ago

I want to set a time limit too so that expensive queries don't cause performance problems.

simonw commented 3 years ago

Useful demo (in ipython):

import asyncio
proc = await asyncio.create_subprocess_shell(
    "sleep 5; echo 'hello'; " * 5,
    stdout=asyncio.subprocess.PIPE,
    stderr=asyncio.subprocess.PIPE
)
while True:
    print(await proc.stdout.readline())

This works as expected for the first 25 seconds, then goes into an infinite loop outputting b''

simonw commented 3 years ago

proc.kill() seems to do the right thing against this too.

simonw commented 3 years ago

I need asyncio.wait_for() for a time limit:

https://docs.python.org/3/library/asyncio-task.html#asyncio.wait_for

    try:
        await asyncio.wait_for(eternity(), timeout=1.0)
    except asyncio.TimeoutError:
        print('timeout!')
simonw commented 3 years ago
import asyncio
proc = await asyncio.create_subprocess_shell(
    "sleep 5; echo 'hello'; " * 5,
    stdout=asyncio.subprocess.PIPE,
    stderr=asyncio.subprocess.PIPE
)
try:
    await asyncio.wait_for(proc.stdout.readline(), timeout=1.0)
except asyncio.TimeoutError:
    print('timeout!')
    proc.kill()
simonw commented 3 years ago

This seems to do the job in local testing:

async def run_ripgrep(pattern, path, time_limit=3.0, max_lines=1000):
    proc = await asyncio.create_subprocess_exec(
        "rg",
        pattern,
        path,
        "--json",
        stdout=asyncio.subprocess.PIPE,
        stdin=asyncio.subprocess.PIPE,
    )

    async def inner(results):
        while True:
            line = await proc.stdout.readline()
            if line == b'':
                break
            results.append(json.loads(line))
            if len(results) > max_lines:
                break

    results = []
    time_limit_hit = False
    try:
        await asyncio.wait_for(inner(results), timeout=time_limit)
    except asyncio.TimeoutError:
        time_limit_hit = True
    proc.kill()
    # We should have accumulated some results anyway
    return results, time_limit_hit