Rewrite discussion - Githubissues

leonhard-s commented 4 years ago

I feel like picking up the threads where I left off some months ago: a full rewrite.

There are currently two parallel interfaces, one provides the low-level URL generation stuff (which should support any game with a DBG API), the other is the object-oriented, Pythonic model for PlanetSide 2.

I would like to start over with focussing on the Python end, making sure that is a nice, clean API without weird "query commands go into methods" decisions - the end user within Python shouldn't even have to think about query commands existing.

My instinct would be "add unit tests, fix the bugs, start rewriting and keep testing using the shiny new unit tests", but only if there is shared interest in making the API nicer to use.

qcoumes commented 4 years ago

Even with an object-oriented pythonic model, a more low-level API will be needed to retrieve the data from census. I think that the current implementation you wrote (Query and Join) is already very good.

The main problem for me with the API is the lack of tests.

Once the API is correctly tested, we can start creating python objects for main planetside2's structure (Player, Weapon, Item, Vehicles, Outfit...). The biggest problem, since the census API can be quite slow, will be to decide which data should be retrieved immediately, and which one on-demand (can be done easily thought @property).

For instance, we can't retrieve everything about a Character (items, stats, outfit, etc...) just by instantiating it, it will take way too much time.

LordFlashmeow commented 4 years ago

~~I've already put together a batch of tests, let me send in a PR.~~ Most of my tests are checking features that haven't been merged yet, so I'll hold off on them.

LordFlashmeow commented 4 years ago

weird "query commands go into methods" decisions - the end user within Python shouldn't even have to think about query commands existing.

While I agree with this in principle, I've found that the API has a lot of power and flexibility under the hood that would be very hard to capture with object-oriented methods. In one of my projects, I'm getting recent kill data, but only for infantry weapon kills. By using an inner join, I can get more (relevant) events by filtering out events that don't match the terms. Out of ~250 lines of code for the project, there are 5 lines using the library, making 3 queries in total. The rest is calculating and printing statistics.

Personally, I think that as an API wrapper, the goal is to make the API accessible through python so that programmers can focus on writing effective, understandable queries and having easy to manage data come back. With regards to that, the wrapper is 95% of the way there.

I think there's a line that needs to be investigated between making simple queries easy to write and spending our time guessing what other developers want and doing it for them.

leonhard-s commented 4 years ago

The biggest problem, since the census API can be quite slow, will be to decide which data should be retrieved immediately, and which one on-demand (can be done easily thought @property).

In my previous implementations of the object model (none of which I was happy with), some attributes would spawn sub-queries that would fill in the given data and then pretend nothing happened. I liked the syntax, but the API latency made it rather annoying to use.

That is why I wanted to rewrite the wrapper using a non-blocking HTTP library. requests is terrific, but it is not ideal for queries that can take a second or two to respond (at least from EU). I looked into aiohttp, which would have introduced a bunch of async creep throughout the wrapper.

We could modify the query system to no longer carry out requests. Instead, it will generate nicely formatted query strings (maybe using the furl library) that the user can then use with requests if they prefer not to deal with asyncio, or they can use the object model which would use asyncio to provide an easy, responsive interface for larger projects or Discord bots (the latter is a big use case imo).

leonhard-s commented 4 years ago

Here is a snippet from my previous WIP version of the object model.

I had a cache attached to any static object types (such as factions, vehicles, weapons, etc.) that kept them around for a while (depending on how big the collection was and how often it was accessed).

In the example below, only the first Character.faction getter for each faction ID would result in a query, after that the caching system would intercept the constructor and return the existing instance instead:

class Character(CensusTable):
    ...

    @property
    def faction(self):
        """Return the faction of the character."""
        return Faction.get(id_=self._faction_id)

    @staticmethod
    def get_by_name(name, ignore_case=True):
        """Return a character by name."""
        # Generate request
        query = Query(collection='character')
        if ignore_case:
            query.add_term(field='name.first_lower', value=name.lower())
        else:
            query.add_term(field='name.first', value=name)
        data = query.get(single=True)
        if not data:
            raise NoMatchesFoundError
        # Retrieve and return the object
        instance = Character.get(id_=data['character_id'], data=data)
        return instance

    ....

I still believe that this is an elegant way of providing the data to a user (depending on the workload, the user could customize the caches for memory footprint vs. query count).

But this still generates a fair number of sub-queries depending on what the user is doing and would require async to make it responsive. However, there is no such thing as an asynchronous @property, so we would lose that syntactic sugar again.

At that point, we might as well use getter methods, which would also give a nice distinction between things that don't cause traffic (properties) and things that do (methods).

// Edit: While asynchronous properties are not natively supported, there is a project that enables them on PyPI. Might be an option? https://pypi.org/project/async-property/

leonhard-s commented 4 years ago

I will start refactoring the module a little, namely into three components:

URL generation and validation (generic)
Request handling and error checking (generic)
Object model (ps2 only)

I will try to keep the generic modules compatible with games other than PlanetSide 2, but I will also not go out of my way to uphold compatibility if it hamstrings the PS2 side of things.

qcoumes commented 4 years ago

but I will also not go out of my way to uphold compatibility if it hamstrings the PS2 side of things

Since auraxium refers to planetside 2, it make sense.

leonhard-s commented 4 years ago

I just pushed 8ea24a17 onto the rewrite branch. It is not tested and probably has some typos or omissions, but it's a start.

This new implementation of the census module will not perform queries, it will only generate the URLs. This opens up the choice of HTTP library for the user; lets us go full-on async creep in the object model without forcing users to do the same.

So yeah - any feedback would be appreciated, I want to finish this module off before I delve into the object model madness (I have a WIP model, but I needed query generation to be able to test it for usability).

qcoumes commented 4 years ago

I'll try to look at that tomorrow

qcoumes commented 4 years ago

Seems like a good start to me, I like the way yarl works.

leonhard-s commented 4 years ago

Yeah, it also made the new unit tests (d3988b5f) a lot easier to write. 😄

What do you think about the method naming, especially for query commands? I don't want to have to make the attributes private or hide them in some other namespace to avoid collisions with the setter methods.

That is why the names are so irregular right now, I wanted to see which feels right. I don't really want to stray too far from the native API names in this module, hence all the terrible set_<attr> names.

We could do something sneaky with descriptors, such that calling an attribute invokes the setter (thus letting the user chain arguments as before), but for value assignment and retrieval the object would still behave like an attribute. A bit like @property, except that we could call our descriptor objects as well.

leonhard-s commented 4 years ago

I pushed a tentative version of the attribute descriptors onto the rewrite branch but didn't end up liking it a whole lot. I replaced it with a public, typed dictionary for clarity.

The current rewrite commit (3ee9589) is more or less final as far as I'm concerned, at least until I can start testing it with real-world queries and all. Still needs better test coverage, especially for nested joins and other funky URL quirks like that; the "wrong separator for inner joins" issue would not be caught by this version.

I'll get back to playing with the object model next, not sure how much time I'll find this week.

qcoumes commented 4 years ago

I have some free time in the coming weeks if you need some help

leonhard-s commented 4 years ago

Hey, thanks for the offer. Sorry I slowed down a bit lately, but I am finally (close to) done with implementing the object model. Soon™️

The event streaming client is still missing, but that should be fairly quick to implement.

The other thing I am still unhappy with are directive checks or statistics. There are methods to access them for now, but that is slightly missing the point of abstracting away the API collections themselves. I am thinking about adding an external statistics module ("retrieve stats for these characters", that module can then use optimised queries to keep the traffic down).

I'll try to clean up some of the rougher edges and add documentation over the weekend and add some examples.

leonhard-s commented 4 years ago

The weekend became a week, but I finally got around to tidying up a bit.

The rewrite branch is now the primary branch for the repository. The old implementation can still be found in the legacy branch for reference, though people looking for that kind of interface might want to switch to ps2-census instead.

The URL generator can be used standalone and resides in the auraxium.census module. It is currently also the only one covered over in Read the Docs until I get around to cleaning up the object model docstrings.

With that, the roadmap looks something like this, in descending order of priority (rather than by time):

[x] Add the event streaming client
[ ] ~~Make the object model docstrings compliant with Sphinx docs~~
[ ] ~~Add properties for common fields (rather than having to go through PS2Object.data.<attr>)~~
[ ] ~~Find a neater way to deal with directive or statistics checks~~
[ ] ~~Add properties for HSR, ACC and other secondary statistics~~
[ ] ~~Create utility methods for common batch information, mostly to improve performance (player stat overview, etc.)~~

I'll get started on the event streaming client as that's been bugging me. After that it's probably time to revisit the object model, clean up the docs, add properties, ... just make sure it is as convenient as we'd like it to be.

leonhard-s commented 4 years ago

The bulk of the rewrite is mostly complete as far as I'm concerned, I'll be moving the remaining tasks into individual issues.

leonhard-s / auraxium

Rewrite discussion #8