orhun / rustypaste

A minimal file upload/pastebin service.
https://blog.orhun.dev/blazingly-fast-file-sharing
MIT License
757 stars 47 forks source link

Use async for filesystem hits #246

Open RealOrangeOne opened 6 months ago

RealOrangeOne commented 6 months ago

Except Directory, as there's no async-compatible glob implementation

Description

Use tokio::fs when interacting with the filesystem.

Motivation and Context

This frees the event loop up for doing other things when waiting for the filesystem.

How Has This Been Tested?

The compiler doesn't complain, and very little else has changed.

I'd try the unit tests, but those fail on master, let alone this branch.

Changelog Entry

Types of Changes

Checklist:

RealOrangeOne commented 6 months ago

I've not done any benchmarks from this, but I'd expect most of the benefit to come in throughput as opposed to raw latency. Adding more yields adds more overhead to a given request, but allows the server to handle other requests at the same time. I doubt rustypaste is used in any particularly high-throughput deployments, but can't hurt to be that little more efficient. It also means more concurrent requests can be handled by fewer workers.

orhun commented 6 months ago

Sounds good to me. Clippy is failing, can you check?

I think this PR will be ready to merge when the CI is green!

tessus commented 6 months ago

I have noticed that the tests fail - in-code and fixtures.

tessus commented 6 months ago

what's up with this PR? The tests fail and it should be merged so that I can work on the fix for files with the same name.

Can we get this moving again? First the tests have to pass.

RealOrangeOne commented 6 months ago

Even the master test suite fails locally for me, so I don't trust my local environment (or there's something else weird going on). I think I found the root cause of the main outstanding failures, which I've just fixed. Hopefully, the tests now pass properly

tessus commented 6 months ago

Even the master test suite fails locally for me,

You are correct. I've opened an issue #253 to track this issue. However, there are only 2 tests failing in git master. But I recall there were quite a lot failing here.

RealOrangeOne commented 6 months ago

There are only 3 failing here, all looking like they may be related. I'll rebase and see if that resolves anything.

tessus commented 6 months ago

There is still one test case failing. Not sure why that is when it's ok in git master: https://github.com/orhun/rustypaste/actions/runs/8167143469/job/22327067458#step:5:517

P.S.: I can try running your branch locally and check out the tests. But I am way past my bed time. I'll have to do that tomorrow.

RealOrangeOne commented 6 months ago

Possibly a race in the IO, or something else behaving weirdly. I'll try and look this evening.

tessus commented 6 months ago

I ran the tests in your branch several times locally and there was never an error. All tests passed. @orhun any chance you could try the tests in this branch locally?

I also looked at the test and it is a bit confusing. It converts the body of the response (which is the link that is returned from rustypaste) into bytes. Why the conversion? Anyway, the same link is supposed to be returned for a file that has the same contents. I ran a rustypaste test instance (master branch) and uploaded the same file twice (with duplicate_files = false) and I got 2 different links returned. That should not have happened. But when I run the tests all is fine. How is this possible? This makes no sense.

Maybe orhun has a better crasp on this, since he wrote the code. I'm currently majorly confused.

RealOrangeOne commented 6 months ago

I can't reproduce the issues with cargo test (running that case specifically), but cargo tarpaulin does show an issue. I wonder if it's something to do with how tarpaulin runs, or something around concurrency. I'm not convinced it's related to this PR.

tessus commented 6 months ago

I looked at the code and I am even more confused now than I was before.

It seems that duplicate_files only works iff you do not send an expiry date and there is no default_expiry set in the config. I am not sure whether this was intended or maybe I am msreading the code. It also still doesn't explain why the tests fail on gh, but not locally.

orhun commented 6 months ago

For me all the tests pass locally. I'm not sure what's wrong with the CI :/

I also looked at the test and it is a bit confusing. It converts the body of the response (which is the link that is returned from rustypaste) into bytes. Why the conversion?

I'm guessing I had hard time converting body to a string (due to Pin and stuff) so gave up and just used bytes. But yeah, we should probably not do that.

It seems that duplicate_files only works if you do not send an expiry date and there is no default_expiry set in the config.

I think this was intended - since the other file has an expiry date it is "not the same" as the other file. It should be nice to document this behavior though.

tessus commented 5 months ago

I think this was intended - since the other file has an expiry date it is "not the same" as the other file. It should be nice to document this behavior though.

I don't really understand how to actually create a file that does not expire. Is it only possible, if no default is set? what about sending expiry as -1 or 0? Maybe something to add at one point? I never tried to create a file w/o expiration. Maybe it already works that way.

Even though the expiry date is not the same, the content of the file is the same. This is what tripped me up in my understanding.

But yeah, we should probably not do that.

other tests use string comparison, so it should be an easy fix. Still, it probably won't change the fact that it weirdly fails on gh.

tessus commented 5 months ago

other tests use string comparison, so it should be an easy fix. Still, it probably won't change the fact that it weirdly fails on gh.

ok, I see what the problem was and still is. I guess there's no way around it then. Sometimes I think that rust makes certain things overly complicated and not very intuitive.

tessus commented 5 months ago

It almost seems that on gh either one of the following is ignored:

P.S.: The fixture also passes, so this is really weird.

tessus commented 5 months ago

I ran a few tests here #259 and it seems that tarpaulin has a side effect and makes that particular test fail.

I've opened an issue with tarpaulin: https://github.com/xd009642/tarpaulin/issues/1488

RealOrangeOne commented 5 months ago

So how can we progress this PR? It seems silly to hold it off because the coverage tool has a bug. Should we remove the coverage checker, merge this in, then create a separate PR to add it back?

orhun commented 5 months ago

I switched to cargo-llvm-cov (#260) until the issues with cargo-tarpaulin are fixed. Everything should be fine now 🤞🏼

orhun commented 5 months ago

I just realized an issue with this PR, hot-reloading the configuration file does not work for me. Just run cargo run and update config.toml - I expect to see "Configuration has been updated." and new values taking place but something is not working.

RealOrangeOne commented 5 months ago

Nice to see the tests passing!

Looks like it's not working because the watcher thread can't get an exclusive lock on the config - I think something is holding onto it for too long. I'll investigate.

tessus commented 5 months ago

Since my PR was merged, an .await haa to be added to the glob_match_file.

However, something is wrong after you force pushed. The changes that were merged from master by orhun are no longer in this PR.

tessus commented 5 months ago

@RealOrangeOne can you merge the changes in master back into this PR, change the one call to use .await, so that we can run the tests. Or you can give me push access to your branch and I do it.

P.S.: I think your force-push created a conflict.

tessus commented 5 months ago

The tests in paste.rs are missing the .await for the store_file call. What is strange though is that this was not caught by the lints.

P.S.: Why on earth would cargo clippy --tests --verbose -- -D warnings not catch this? How could the tests have been even compiled?

RealOrangeOne commented 5 months ago

Which ones? I've looked and all of them seem to call .await

tessus commented 5 months ago

Wait, I might have checked the wrong branch. Give me a sec. One test fails though.

tessus commented 5 months ago

ok, so I ran your branch locally and the tests pass. But on the ci the following happens:

thread 'paste::tests::test_paste_data' panicked at src/paste.rs:346:9:
tessus commented 5 months ago

I will run a few tests later this evening.

tessus commented 5 months ago

I ran a bunch of different modified tests and I am at a loss. I am stuck. No idea what is going on.

tessus commented 5 months ago

@orhun have you been able to figure out what the issue could be? it seems this is blocking a new release for a while now and the other PRs haven't been merged yet either.

orhun commented 5 months ago

I managed to reproduce the test failure locally!

Just use non-async calls in tests.

Details ```diff diff --git a/src/paste.rs b/src/paste.rs index 7ae5bb9..4a088a0 100644 --- a/src/paste.rs +++ b/src/paste.rs @@ -287,6 +287,7 @@ mod tests { use awc::ClientBuilder; use byte_unit::Byte; use std::env; + use std::fs; use std::str::FromStr; use std::time::Duration; @@ -307,14 +308,14 @@ mod tests { type_: PasteType::File, }; let file_name = paste.store_file("test.txt", None, None, &config).await?; - assert_eq!("ABC", fs::read_to_string(&file_name).await?); + assert_eq!("ABC", fs::read_to_string(&file_name)?); assert_eq!( Some("txt"), PathBuf::from(&file_name) .extension() .and_then(|v| v.to_str()) ); - fs::remove_file(file_name).await?; + fs::remove_file(file_name)?; Ok(()) } @@ -335,10 +336,10 @@ mod tests { type_: PasteType::File, }; let file_name = paste.store_file("foo.tar.gz", None, None, &config).await?; - assert_eq!("tessus", fs::read_to_string(&file_name).await?); + assert_eq!("tessus", fs::read_to_string(&file_name)?); assert!(file_name.ends_with(".tar.gz")); assert!(file_name.starts_with("foo.")); - fs::remove_file(file_name).await?; + fs::remove_file(file_name)?; config.paste.random_url = Some(RandomURLConfig { length: Some(4), @@ -351,10 +352,10 @@ mod tests { type_: PasteType::File, }; let file_name = paste.store_file(".foo.tar.gz", None, None, &config).await?; - assert_eq!("tessus", fs::read_to_string(&file_name).await?); + assert_eq!("tessus", fs::read_to_string(&file_name)?); assert!(file_name.ends_with(".tar.gz")); assert!(file_name.starts_with(".foo.")); - fs::remove_file(file_name).await?; + fs::remove_file(file_name)?; config.paste.random_url = Some(RandomURLConfig { length: Some(4), @@ -367,9 +368,9 @@ mod tests { type_: PasteType::File, }; let file_name = paste.store_file("foo.tar.gz", None, None, &config).await?; - assert_eq!("tessus", fs::read_to_string(&file_name).await?); + assert_eq!("tessus", fs::read_to_string(&file_name)?); assert!(file_name.ends_with(".tar.gz")); - fs::remove_file(file_name).await?; + fs::remove_file(file_name)?; Ok(()) } @@ -386,9 +387,9 @@ mod tests { type_: PasteType::File, }; let file_name = paste.store_file(".foo", None, None, &config).await?; - assert_eq!("xyz", fs::read_to_string(&file_name).await?); + assert_eq!("xyz", fs::read_to_string(&file_name)?); assert_eq!(".foo.txt", file_name); - fs::remove_file(file_name).await?; + fs::remove_file(file_name)?; config.paste.default_extension = String::from("bin"); config.paste.random_url = Some(RandomURLConfig { @@ -401,14 +402,14 @@ mod tests { type_: PasteType::File, }; let file_name = paste.store_file("random", None, None, &config).await?; - assert_eq!("xyz", fs::read_to_string(&file_name).await?); + assert_eq!("xyz", fs::read_to_string(&file_name)?); assert_eq!( Some("bin"), PathBuf::from(&file_name) .extension() .and_then(|v| v.to_str()) ); - fs::remove_file(file_name).await?; + fs::remove_file(file_name)?; Ok(()) } @@ -436,9 +437,9 @@ mod tests { &config, ) .await?; - assert_eq!("tessus", fs::read_to_string(&file_name).await?); + assert_eq!("tessus", fs::read_to_string(&file_name)?); assert_eq!("fn_from_header.txt", file_name); - fs::remove_file(file_name).await?; + fs::remove_file(file_name)?; config.paste.random_url = Some(RandomURLConfig { length: Some(4), @@ -458,9 +459,9 @@ mod tests { &config, ) .await?; - assert_eq!("tessus", fs::read_to_string(&file_name).await?); + assert_eq!("tessus", fs::read_to_string(&file_name)?); assert_eq!("fn_from_header", file_name); - fs::remove_file(file_name).await?; + fs::remove_file(file_name)?; Ok(()) } @@ -476,8 +477,7 @@ mod tests { PasteType::Oneshot .get_path(&config.server.upload_path) .expect("Bad upload path"), - ) - .await?; + )?; let paste = Paste { data: vec![116, 101, 115, 116], @@ -491,8 +491,8 @@ mod tests { .get_path(&config.server.upload_path) .expect("Bad upload path") .join(format!("{file_name}.{expiry_date}")); - assert_eq!("test", fs::read_to_string(&file_path).await?); - fs::remove_file(file_path).await?; + assert_eq!("test", fs::read_to_string(&file_path)?); + fs::remove_file(file_path)?; Ok(()) } @@ -511,8 +511,7 @@ mod tests { PasteType::Url .get_path(&config.server.upload_path) .expect("Bad upload path"), - ) - .await?; + )?; let url = String::from("https://orhun.dev/"); let paste = Paste { @@ -524,8 +523,8 @@ mod tests { .get_path(&config.server.upload_path) .expect("Bad upload path") .join(&file_name); - assert_eq!(url, fs::read_to_string(&file_path).await?); - fs::remove_file(file_path).await?; + assert_eq!(url, fs::read_to_string(&file_path)?); + fs::remove_file(file_path)?; let url = String::from("testurl.com"); let paste = Paste { @@ -548,8 +547,7 @@ mod tests { PasteType::Url .get_path(&config.server.upload_path) .expect("Bad upload path"), - ) - .await?; + )?; let url = String::from("https://upload.wikimedia.org/wikipedia/en/a/a9/Example.jpg"); let mut paste = Paste { @@ -570,15 +568,14 @@ mod tests { "70ff72a2f7651b5fae3aa9834e03d2a2233c52036610562f7fa04e089e8198ed", util::sha256_digest(&*paste.data)? ); - fs::remove_file(file_path).await?; + fs::remove_file(file_path)?; for paste_type in &[PasteType::Url, PasteType::Oneshot] { fs::remove_dir( paste_type .get_path(&config.server.upload_path) .expect("Bad upload path"), - ) - .await?; + )?; } Ok(()) ```

Results in a bunch of failures (same as CI) for me. I think it is a spurious async issue.

it seems this is blocking a new release for a while now and the other PRs haven't been merged yet either.

Do you mean the dependabot PRs? Is there any other PR planned for the release on your side?

RealOrangeOne commented 5 months ago

I think it is a spurious async issue.

I suspect you're right. CI runs tests in 1 thread, but it's possible it still runs multiple cases in the same event loop?

The only way I can think for trying to work through these is to have each test case run in its own temporary directory instead, which would prevent them from interracting - which I suspect is what's happening here, but obviously can't prove it.

What should the next step be? It'd be great to get this in!

orhun commented 5 months ago

I think we can try what you have suggested and try to have a more isolated test environment. I found this which is pretty interesting. Might not be needed though.

@tessus thoughts?

tessus commented 5 months ago

Do you mean the dependabot PRs? Is there any other PR planned for the release on your side?

Yep, I meant those. And nope, nothing planned from my side. I also noticed that the CI is throwing warnings, but this has nothing to do with a release.

I found this which is pretty interesting. Might not be needed though.

This is actually pretty awesome. This also means, we could run the tests with multiple threads then, correct? Isolating tests is always a good idea. IMO this should be the default for tests anyway with a keyword to allow tests to not be isolated iff needed. Not sure why that wasn't implemented in Rust core.

@tessus thoughts?

The only issue I have with it is that I don't understand why the tests fail. In https://github.com/orhun/rustypaste/discussions/263 I've tried nextest but also changing the test, which turned out to result in another problem (stalling/timeout, which makes no sense either). Maybe I am missing something, because I am not a Rust expert, but from what I can ascertain from a logical point of view is that they shouldn't fail.

orhun commented 5 months ago

Yeah, I still think that making some things async broke some internal concurrency guarantees and now the tests are failing spuriously because of that. We can proceed with making things isolated and see if that works.

RealOrangeOne commented 5 months ago

sealed_test doesn't seem to mix well with actix_rt::test. I've implemented it manually, and the tests seem to pass more reliably locally. It's still not perfect, and still requires --test-threads 1 for some reason, but I think digging into that further is probably out of scope for this PR.

orhun commented 5 months ago

Yeah, as long as the CI passes I think we are set. It is still failing for some freaking reason though 💀

tessus commented 5 months ago

So what is the plan here? The issue is if we merge this all future tests will fail.

It's possible that there are some issues with test, actix, and async. I have no idea. But something is off. These tests should not fail. Maybe there's a bug in actix or cargo test. Either way, can we not hold off the release any longer?

auto-merge can be setup, the PRs merged (except this one), and a new release be cut. Unfortunately I can't do any of this myself.

orhun commented 5 months ago

Either way, can we not hold off the release any longer? auto-merge can be setup, the PRs merged (except this one), and a new release be cut.

Yeah, I think I'm going to move forward with the release this week, without this PR :/

orhun commented 5 months ago

Someone suggested assay for isolated testing FWIW

tessus commented 5 months ago

This project seems dead. I understand that if there are no dependencies, or a project does not have anything to add because it is feature complete, or there are no bugs, it is possible that there are no commits. But a PR has been ready to be merged for 1.5 years and the maintainer just ignored it after the OP addressed all comments.

This is not something you want to use. Well, at least I wouldn't.

tessus commented 5 months ago

I rebased to master in this test: #272 and it seems that the tests pass now in CI. They also pass locally. Maybe @RealOrangeOne you can rebase this PR as well.

tessus commented 5 months ago

@orhun something is seriously wrong here. the PR now has the exact same code as in #272

tessus commented 5 months ago

@RealOrangeOne I think we have to try something else.

Can you create a new PR with the current changes?

  1. save the changed files
  2. create a new branch off master
  3. move the saved files back

Not using any git commands for step 1 and 3 is on purpose.

orhun commented 5 months ago

Maybe there is a cache issue? We can try clearing the cache first.

orhun commented 5 months ago

Ok cleared the cache, but that didn't really do anything.