scrapinghub / frontera

A scalable frontier for web crawlers
BSD 3-Clause "New" or "Revised" License
1.29k stars 215 forks source link

Inconsistent data types within library #411

Open adityaraj-28 opened 1 year ago

adityaraj-28 commented 1 year ago

While working with sqlalchemy backend for DB and strategy worker and Kafka backend for spider. Found out that in the file frontera.worker.server.WorkerJsonRpcService there is this function

class WorkerJsonRpcService(JsonRpcService):
    def __init__(self, worker, settings):
        root = RootResource()
        root.putChild('status', StatusResource(worker))
        root.putChild('jsonrpc', WorkerJsonRpcResource(worker))
        JsonRpcService.__init__(self, root, settings)
        self.worker = worker

And in the Resource class which the class where the putChild function is defined, I saw this line

    def putChild(self, path: bytes, child: IResource) -> None:
        if not isinstance(path, bytes):
            raise TypeError(f"Path segment must be bytes, but {path!r} is {type(path)}")

So it always throws the TypeError as we send a string as first param in putChild and accept it to be bytes. Changing the code to below works, but is not elegant as I don't wan't to modify library code just for this usecase.

root.putChild(b'status', StatusResource(worker))
root.putChild(b'jsonrpc', WorkerJsonRpcResource(worker))

Happy to raise a PR post discussion !