json-schema-org / community

A space to discuss community and organisational related things
85 stars 34 forks source link

GSoC: `bowtie-perf`: a Performance Tester for JSON Schema implementations #605

Open Julian opened 10 months ago

Julian commented 10 months ago

Project title

bowtie-perf: a Performance Tester for JSON Schema implementations

Brief Description

Bowtie is a tool which provides "universal access" to JSON Schema implementations, giving JSON Schema users a way to use any implementation supported by Bowtie.

A primary use case for Bowtie was to allow comparing implementations to one another, which can be seen on Bowtie's website and which gives anyone a way to see how correct a JSON Schema implementation is relative to the JSON Schema specification.

But it can do more! Let's write a performance tester using Bowtie, giving us a way to also compare performance of implementations by timing how long they take to do their work. This information could be used to do performance optimization, or as a second dimension that users could use when comparing implementations with one another.

Refs: bowtie-json-schema/bowtie#35

Expected Outcomes

Skills Required

Mentors

@Julian

Expected Difficulty Hard

Expected Time Commitment 350 hour

sudo-jarvis commented 9 months ago

Hi @Julian , I would love to work on this as part of GSoC 2024. I have already started familiarising myself with bowtie and would love to discuss more about this project.

Thanks Very excited to contribute

Julian commented 9 months ago

Great! This one is quite self contained -- if you're already familiar with how Bowtie works, my best recommendation here is to start researching ways of doing "generic" performance monitoring of applications.

Specifically, eBPF I think is a good keyword to poke around at / look for some tutorials!

sudo-jarvis commented 9 months ago

Thanks @Julian , I would start researching and learning about it right away.

ashmit-coder commented 9 months ago

Hey @Julian I am Ashmit Jagtap from Indian Institute of Information Technology, Pune. I would love to work for this project under GSOC 2024. I have been contributing in Async API as well as trying to make some PR's in the bowtie project. I will start researching about the things that we may need for the project and keep you posted for any queries/ improvemnets that i may have.

matthewjselby commented 9 months ago

Hi @Julian ๐Ÿ‘‹ I also wanted to express my interest in working on this project. I've been learning a lot about perf, eBPF, and other profiling/tracing tools as part of a systems software class I'm taking this semester. I'm proficient in Python and have experimented with JSON schema in personal projects, so I feel like this is a natural fit for me and would give me the opportunity to put what I've been learning about into action. On first glance of the source it looks like each implementation is tested in a container created/started by bowtie, is that correct? Seems like an interesting wrinkle for the performance testing if so. Are you interested in any other performance metrics, or primarily focused on timing?

benjagm commented 9 months ago

Thanks a lot for joining JSON Schema org for this edition of GSoC!!

Qualification tasks will be published as comments in the project ideas by Thursday/Friday of this week. In addition I'd like to invite you to a office hours session this thursday 18:30 UTC where we'll present the ideas and the relevant date to consider at this stage of the program.

Please use this link to join the session: ๐ŸŒ Zoom ๐Ÿ“… 20124-02-29 18:30 UTC

See you there!

Julian commented 9 months ago

Hi!

On first glance of the source it looks like each implementation is tested in a container created/started by bowtie, is that correct?

Yep exactly!

Are you interested in any other performance metrics, or primarily focused on timing?

Timing is a first obvious one. Number of instructions would also be interesting certainly.

And some other even more basic ones are in https://github.com/bowtie-json-schema/bowtie/issues/904 !

matthewjselby commented 9 months ago

@Julian - thanks for the info! I've been thinking about this a bit more. In my mind there are two approaches that stick out:

  1. Implementation-agnostic performance testing implemented as part of the core bowtie codebase. For timing, I could see possibly reworking communication between bowtie and implementation containers to be synchronous during runs of bowtie perf to work around the async issues with responses mentioned here, although I wonder how feasible that is or how much of a performance drag it would cause. There may be other ways to accomplish this as well, I'm still thinking on that. For other metrics, I'm not exactly sure how this would work - this could be promising from the host machine, but I don't think it would work if bowtie itself runs in a container and would be difficult anyways because of variations in implementations. Further research would be needed.

  2. Implementation-specific performance testing. This could be part of a harness itself (performance data - timing being relatively low hanging fruit - is collected by the harness and provided as part of a response) or integrated with the individual implementation containers (e.g., uprobes or some other individualized profiling - subject to the same difficulties mentioned above because of variations in implementations). This would forego the need to modify anything about the way bowtie communicates with implementation containers, but has to be implemented for each harness, and the code for parsing responses needs to be updated.

In general additional metrics apart from timing seem pretty complicated to collect due to the variations in implementations, but timing data seems in reach with either approach. I'm interested in your thoughts or whether you have any additional high-level ideas for exploration. Thanks!

Julian commented 9 months ago

I like that breakdown a lot! I've definitely got more thoughts, so will have to come back and elaborate further.

On the async bit -- just want to mention 2 possible mitigating factors -- one (which I never went back and edited into the issue you saw) is that we can definitely run only one implementation at a time -- so as long as we structure the actual timing collection correctly we could essentially just run each implementation one at a time and see whether that helps improve our accuracy.

And second, I still hope to do some internal refactoring to make how Bowtie runs implementations more general, so yeah we should keep everything on the table there if it turns out that's helpful for this functionality as well!

I also would stress the Benchmark part a bit too, I don't think it's critical we have a gigantic suite, but I think a good early part of this will be identifying say 10 representative benchmarks which we use to kick our tires and find interesting performance-adjacent learning -- here's a recent example that I added as a benchmark to my specific implementation -- https://github.com/python-jsonschema/jsonschema/blob/0bc3b1fe8770195eb9f7e5c0d7d84c7007b9a2a5/jsonschema/benchmarks/useless_keywords.py -- not sure how much sense that will make in isolation, but it's basically "imagine a giant JSON Schema where most of the keywords are useless -- does the implementation understand that it should immediately through all those keywords away, or does it continuously re-process them every time it does validation?".

Will come back with more specific thoughts but from clicking (but not yet reading) your links, you're on the right track for sure, so well done already.

sudo-jarvis commented 8 months ago

@Julian , I had started to look into ways of performance profiling and monitoring for applications and these are some of my initial understandings:

  1. As the various implementations supported by Bowtie are very diverse in nature, implementing some sort of Implementation specific testing using language specific libraries (such as py-perf for python) seems to be complicated as also mentioned by @matthewjselby. In addition to the fact that handling performance data in every harness would be tedious and repetitive, it could also be that a performance testing library doesn't simply exist for that language. But we can still have something like this in place if the other method turns out to be even more complicated. Currently our harnesses support 4 type of requests - start, stop, run, dialect. We could add a 5th one and maybe call it benchmark, this request would set the benchmark flag to be true similar to how dialect req sets the default dialect and if this flag is on then the harness would return the benchmark data as well, which then bowtie can aggregate.

  2. Making bowtie itself support profiling various implementations (i.e. not expecting performance data to be returned by the harness itself ) seems to be a better option as it would mean that bowtie perf is generic and remains independent of any specific implementation.

    1. However, this approach too has its own challenges and drawbacks it seems. For eg. using some tools liks bpftrace and bcc which are based on eBPF or even perf could work from the host machine i.e. treat every harness as a process running inside a container and then trace this specific process but this would mean that bowtie must be running on a host that supports eBPF which might simply not be the case if its running on windows or macOS or even running inside another container.

    2. The above point could be a problem for other metrics. Timing still must be easy to measure ig by just ensuring that in our bowtie perf command we simply run each implementation one by one measuring the time taken for response to return for each implementation. But not sure how to proceed for other benchmarks like maybe no. of instructions executed.

  3. After we have the system decided using which we would measure the metrics, we can have a default test suite trying to cover any critical aspects that can affect performance similar to the one you mentioned that you have made for python-jsonschema.

Would love to know your thoughts on the same.

Julian commented 8 months ago

Great!

I would definitely focus on the second kind first -- i.e. performance and profiling that is more language-agnostic even if less granular. We can always get to doing both.

It's true that the story is better on Linux than other OSes there, I think that's probably fine, certainly to start, as that's generally the main OS where people run "real applications" which is where someone is more likely to care about performance, but there are some even cruder things we can do which are also OS agnostic, like pure timing numbers, so that's almost certainly the easiest first target regardless.

sudo-jarvis commented 8 months ago

@Julian, on the basis of my learnings there can be 2 broad ways in which we implement time measurements while running implementations:

1. Using some existing profiling library such as Cprofile or timeit:

Cpython: This would be easier to use however it would offer less customization and also most of the details offered by the profiler wont be relevant to our usecase as most of the work is happening inside the container and not by the main python program itself so things it identifies like various function calls, time taken by each of them etc would be irrelevant to the implementation.

timeit: This module provides a simple way to time small bits of Python code. It inherently allows running the code specified no. of times and giving the avg. time taken. However, with this also customization might be a problem as when running timeit on a function we might only be able to measure time, for other metrics we might need to run the implementation again.

2. Implementing using the very basic time.perf_counter() function to measure time with high precision:

Python provides a high precision clock - time.perf_counter() for timing purposes. This can be simply used for our usecase. Instead of running each implementation only once, we might run them for some specified number of times such that the observations can get averaged out maybe also allowing the users to change the no. of runs they want during benchmarking.

With this we can start measuring time after the container has been started in this way ignoring time spent starting or closing the containers:

await start_container()

st_time = time.perf_counter()
// Logic of sending request and fetching response from the harness container
end_time = time.perf_counter()

await close_container()

time_elapsed = end_time - st_time

References:

  1. Cprofile: https://docs.python.org/3/library/profile.html#module-cProfile
  2. timeit: https://docs.python.org/3/library/timeit.html
  3. time module: https://docs.python.org/3/library/time.html#time.perf_counter

Do you have some additional insights regarding the same?

Julian commented 8 months ago

perf_counter(_ns) is definitely fine as a start! As you say we're not trying to profile bowtie itself, so we probably can get a decent amount of mileage out of simple timing numbers and then eBPF (and/or instruction counts) as a second step.

benjagm commented 8 months ago

๐Ÿšฉ IMPORTANT INSTRUCTIONS REGARDING HOW AND WHERE TO SUBMIT YOU APPLICATION ๐Ÿšฉ

Please join this discussion in JSON Schema slack to get the last details very important details on how to better submit your application to JSON Schema.

See communication here.

github-actions[bot] commented 5 months ago

Hello! :wave:

This issue has been automatically marked as stale due to inactivity :sleeping:

It will be closed in 180 days if no further activity occurs. To keep it active, please add a comment with more details.

There can be many reasons why a specific issue has no activity. The most probable cause is a lack of time, not a lack of interest.

Let us figure out together how to push this issue forward. Connect with us through our slack channel : https://json-schema.org/slack

Thank you for your patience :heart: