Closed aleaksunder closed 3 years ago
Well, the fastest way to speed it up:
C:\src\yoriperf>timethis oneyori.exe -nouser -c test.ys1
Elapsed time: 0:00:00.203
Child CPU time: 0:00:00.203
Child kernel time: 0:00:00.125
Child user time: 0:00:00.078
Tree CPU time: 0:00:00.203
Tree kernel time: 0:00:00.125
Tree user time: 0:00:00.078
(ie., using oneyori for this is 40x faster.)
I learned a few horrifying things about Windows while developing Yori, but I think this issue is hitting at the worst: in Windows, CreatePipe will create a synchronous pipe, and there is no way to know if a synchronous pipe has data available because the expectation of the named pipe system is to wait until there is a blocking write and a blocking read at the same time, then shuttling the data across, then completing the requests. This is bad for two reasons: a) performance is terrible due to context switching so the named pipe system needed to add a buffer which undermines the reasoning here and b) processes can't wait on multiple things simultaneously, which makes it impossible to have a cancelable operation, and threads don't help since a worker thread can't check for data on the pipe and check for any communication from another thread.
So Yori did the best thing it could do. It spins waiting to see if it should cancel or if there's data in the pipe. It can't spin on the CPU (which would destroy performance) so there's a sleep between each spin to let the other process make progress. The length of the sleep increases progressively, because if no data has arrived in several seconds it's unlikely that it will arrive a microsecond later.
But this sequence trips up this logic badly. The first couple of processes make progress; the later processes observe no data is arriving and start sleeping for longer and longer; and when data arrives, they are waiting for tens of milliseconds before noticing that any data is available. This magnifies down the chain, because if one process waits pointlessly for 10 milliseconds everything behind it starts increasing the duration of their sleeps, so the next one waits pointlessly for 20 milliseconds, etc. Amusingly, doing this with oneyori is fast because it won't launch 36 things in parallel - it's executing one command at a time, so there's never a time when a reader is waiting for data for a long time.
I have a tweak for this logic to increase the length of the sleep more slowly, which helps in this scenario, but note this is not a free lunch. Every time the thread wakes up and does nothing, it's slowing down the rest of the system, including the thread that's supposed to write data into the pipe.
Anyway, this algorithm makes it 10x faster, and about half the time compared to the 36 line version, but it's still 4x slower than the serial oneyori version:
diff --git a/lib/lineread.c b/lib/lineread.c
index e985595..2b5271f 100644
--- a/lib/lineread.c
+++ b/lib/lineread.c
@@ -506,7 +506,11 @@ YoriLibReadLineToStringEx(
Sleep(DelayTime);
CumulativeDelay += DelayTime;
- DelayTime *= 2;
+ if (DelayTime < 10) {^M
+ DelayTime++;^M
+ } else {^M
+ DelayTime = DelayTime * 5 / 4;^M
+ }^M
if (DelayTime > 500) {
DelayTime = 500;
}
Now I get it...
I have tested it with oneyori.exe
but get the same results with 12 seconds but that was just because my oneyori.exe
was in the same directory with iconv.exe
and repl.exe
( due to it was compiled in that directory ) and it processes script with externals, not with builtins! Now I moved oneyori.exe
away, tested and it is blazingly fast... and nice to see improvement for that "wait" logic
Let me explain why I use Yori at all and why I love it so much =)
It is simple... It is straightforward... It simply works! Everywhere!!! I use it just for that... You script one time and it works!
So I can write script.cmd
put it together with x86
, x64
, arm64
and even arm32
builds and in scenario with oneyori.exe
all that setup will be 2 megabytes in size... it is perfect! Since I can be pretty sure that my script.cmd
will use native tools on every platform and I simply can avoid all that stuff with folder redirection on x64
windows and x86
tools and much more, I just do not want to remember all that stuff that I can avoid with Yori! This will work even in Windows PE which by design do not support execution of other platform architecture tools! So I basically use Yori as a perfectly portable scripting tools
And absolutely no doubt oneyori.exe
is a perfect tool for this case and every case I can imagine... but the one reason I avoid to use it is that oneyori.exe
for x86
platform is fakely detected with antivirus software
Obviously there is no virus at all and this is completely false alarm... but this stops me to use oneyori.exe
and prefer to use yori.exe
with externals since there is no false alarms on these and by the way only oneyori.exe
for x86
is false alarmed, other builds for x64
, arm64
and arm32
is clean by every antivirus on VirusTotal!
So basically I can not send my script along with oneyori.exe
builds to someone and avoid situation where have to say that there it no virus and this is false alarm... this is simply confusing people!
I've dig to it a couple of weeks ago and find an opinion that DLL-injection may cause this false alarms. Sorry, I can not provide you a link, since this was long time ago! But obviously there is a reason why that false alarms exists... and I simply see yuishook.dll
file in Yori's directory! I do not know maybe this DLL is injected... so anyway this is just a theory and assumption
So maybe we can add alternate build of Yori with all builtins that will not be false alarmed and maybe consist of two files: twoyori.exe
and twoyori.dll
=)
And by the way I can help to determine what causes this false alarm... my logic here is simple:
yori.exe
is clean, but oneyori.exe
is false alarmed... so basically there is a thing that oneyori.exe
get compiled with and this causes a false alarm! So if you can tell me the file or maybe files where imports to oneyori.exe
is defined I can do this step by step: remove one imported built-in command, compile and check by VirusTotal... this is not hard for may and may be can help
Oneyori is constructed in sh\yorifull.c and sh\yorifull.lst . The first file defines builtin command functions and has a table of strings mapping to them, and the second defines how to link the code.
That said, I'm not seeing virustotal flag anything in the 1.40 oneyori.exe binary, but it really looks like none of the scanners have seen it before: https://www.virustotal.com/gui/file/acf01971ac3813548ec4ded974522b972cae2adf6400196cb6c68f68d31d945c/detection
As far as I know, the most problematic parts for virus scanners is the update code (in ysetup, ypm, and yget.) This is very boring code that wraps wininet.dll in the simplest possible way to download updates. Unfortunately, virus downloaders also wrap wininet.dll in the simplest possible way to download code, so it's hard for virus scanners to tell the difference. They appear to do this with thumbprint heuristics that describe patterns in "good" and "bad" versions of what is essentially the same code. I wouldn't be surprised at all if using a different compiler results in different outcomes.
Thaks, jus figured out how yorifull.c
and yorifull.lst
corresponds... and now I do not think that there problems only with code that uses wininet
... so I've just removed all from yorifull.lst
and corresponding lines from yorifull.c
and left only this in yorifull.lst
:
..\assoc\builtins.lib
Than compiled and uploaded oneyori.exe
to VirusTotal and get no false positives at all... then I've added yattrib
to yorifull.lst
and yorifull.c
, compiled uploaded and get false alarm... removed yatrrib
and add cal
... this time there was no false positive! So now I am going to add all commands one by one...
Adding attrib
, cab
, clip
to yorifull.lst
and yorifull.c
results in .exe that gets false alarm... strange enough is that if for example I remove all and compile oneyori.exe
with attrib
only it gets no alarms, but if I add assoc
and attrib
then there is alarm...
At this time I have only this in my yorifull.lst
:
..\assoc\builtins.lib
..\cal\builtins.lib
..\charmap\builtins.lib
..\cls\builtins.lib
And compiled exe is not detected as virus on VirusTotal at all... so I am going to do it for all and will post the result later
This is completely weird and now I feel that I am not in the right direction... so when it come to ECHO and I added it to yorifull.c
and yorifull.lst
compiled with it and tested I get a false alarm! And than I thought that ECHO is already builtin in yori.exe
and things become really strange since there is no false alarm on yori.exe
... and things becomes weirder and weirder with below:
..\assoc\builtins.lib
..\cal\builtins.lib
..\charmap\builtins.lib
..\cls\builtins.lib
no false alarm
..\which\builtins.lib
..\wininfo\builtins.lib
..\winpos\builtins.lib
no false alarm
but together:
..\assoc\builtins.lib
..\cal\builtins.lib
..\charmap\builtins.lib
..\cls\builtins.lib
..\which\builtins.lib
..\wininfo\builtins.lib
..\winpos\builtins.lib
There IS false alarm!
I build Yori with Visual Studio 2019 Developer Command Prompt v16.8.2 and v14.16.27043.0 compiler tools
So finally I gave up =) No logic at all...
..\assoc\builtins.lib
no false alarm
..\attrib\builtins.lib
no false alarm
..\assoc\builtins.lib
..\attrib\builtins.lib
FALSE ALARM =)
Now I've tested your builds which is downloaded with YPM
and oneyori.exe
for x86
goes through the VirusTotal without any alarms... So I'm just going to switch to your builds instead of building it on my own, the only question I have here is how frequent download-daily
is updated? Current files dated 18 Jan 2021... so the question is:
Is currently published source on github is download-daily
builds?
I'm not surprised. The problem with heuristic detection based on previously observed "known bad" or "known good" binaries is that small, unrelated changes cause a binary to look less like a "known good" binary and more likely to be flagged.
As you say, download-daily
aren't really daily. They're the builds that I'm testing, which often comprise things that are in github but also changes that aren't in github yet and I'm trying to validate they work well enough to publish. But note that I have the same AV problem you do - because these builds add things like attrib, there's no guarantee that a heuristic based virus scanner won't complain about them.
If you want to generate builds that are very similar to mine, I'm using Visual C++ 6.0 SP5 with the processor pack (which contains MASM) for 32 bit builds, and the Windows 2003 SP1 SDK for 64 bit builds. These older compilers ensure the binaries will run on anything and can't depend on newer technologies without being manually coded to check for them.
Hello!
I understand everything has it's limits and not insist to get this into primary goal... this is just a remark!
I am seeing a sharp drop in performance using decent amount of stream input and output.
This is only for example but let's assume we have a text file with such content:
replace.txt
So in real life we have a text file encoded with non-english ASCII symbols.
And we need to replace all those 36 values with text and going to do so with following script with a lot of pipes:
replace.ys1
All works... but: `timethis.exe yori -nouser -c YS "replace.ys1"
12 seconds certainly not a big deal but if process comes to real text file with normal contents speed degrades drastically...
The thing here is if we split those 36 REPL pipes into 36 different lines:
and do that again with: `timethis.exe yori -nouser -c YS "replace.ys1"
So despite there was 36 disk reads 72 encoding conversions and 36 disk writes it is 6 times faster than 1st variant... one problem here is we need to check an ERRORLEVEL after each of 36 lines to ensure that all was correctly.
And by the way in real life with 30 lines in script we get speed something around 5 seconds per file and with one line in script with 30 pipes speed drops to 30 seconds per file!
And this is not about
REPL
only... I've tried a lot of sequiental pipes around multipeICONV
and it seems it degrades in the same way!So it is basically 6 times faster everywhere I've seen, tried this with
CMD
YORI
YORIMIN
andONEYORI
... and the numbers always almost the same!And just for example tried this in
CMD
with externalfindstr.exe
:So there is 128 pipes and nothing seems to degrade significantly... I am basically want to say that as I see this is not due to Windows piping mechanism!
Or maybe you can advice how to speed things up?
P.S. I'm so sorry to be so annoying =) I believe I am doing useful things for this project