vn-tools / arc_unpacker

CLI tool for extracting images and sounds from visual novels.
GNU General Public License v3.0
573 stars 83 forks source link

Consider technology shift #5

Closed rr- closed 9 years ago

rr- commented 9 years ago
  1. Some areas suffer form very poor performance.
  2. This has prompted me to start creating native extensions in C instead of writing it in plain Ruby.
  3. The obvious advantage lies in the gained performance.
  4. There are, however, numerous disadvantages of such approach:
    1. The codebase becomes impure, it mixes Ruby with C.
    2. It imposes great requirements on the end user - he must have gcc and make around. I assume it is a great PITA to get this right on Windows (used by the target audience) without using Cygwin.
    3. There is RubyInline, but it causes multiple problems on every machine I try to install it on (surprisingly, biggest problems kept happening on Debian). It doesn't resolve the issues mentioned previous points either.
    4. Even if I port the most sensitive parts to C, the remaining code is still quite slow.

The biggest issue with this is that when I started this project, I've chosen Ruby because I wanted it to be as clean and nice as possible... to developers. Now I feel like that as a byproduct, this tool can be used only by developers.

My suggestions are following:

erengy commented 9 years ago

I've previously written a similar multipurpose tool in C++. It did work fast, and it could have worked even faster, but I couldn't help questioning myself whether I actually needed that kind of performance or not. If I were to write it again now, I'd probably go with Python (I'm not familiar with Ruby) or try out Go. Python would gather more contributors, if any. Go would run considerably faster.

How slow is quite slow? Unpacking of an archive is rarely done twice, so that shouldn't be an issue. If the packing process is not unreasonably slow, then the shift may not be worth it.

PS: Rust has just hit v1.0 alpha, which is reassuring in terms of stability.

rr- commented 9 years ago

Hmm... when I implement stuff such as LZSS compressor in Ruby (which uses bit-level arithmetic), it can take up to 40-50 minutes to convert all the graphic files, while the C-powered version crunches everything down in about 2 minutes (yay for unsafe type casting). That's why I keep implementing compressors in C, while implementing everything else in Ruby.

That is to be expected, though: C instructions such as >> and pointer arithmetic translates almost directly into machine code such as shr and lea, while Ruby has to emulate everything in its VM.

I'd go with either Go or Rust. Go seems promising with regard to short compilation times. Although I was aware Rust was going to hit alpha soon, I wasn't aware that they have bold plans to release 1.0 final in, like, just a few months.

rr- commented 9 years ago

By the way, I'm considering withdrawing the support for compressing/packing.

The reason I keep implementing packers is that they make unit testing really easy: assert stuff == unpack(pack(stuff)). But the truth is that:

erengy commented 9 years ago

40-50 minutes? It definitely extends beyond being unreasonable, then. As a wise woman once said, ain't nobody got time for that.

I think it boils down to what your intentions are, and how the tool is supposed to be used. Having a nice and clean codebase is quite helpful when another developer wants to extend the functionality or fix a bug. As long as you don't use a relatively unknown language such as OCaml, it should be fine.

That said, even though most people can figure out how to set up a development environment and to use the command line, non-developers will always prefer having a simple executable file in hand, preferably with a GUI (e.g. AnimED, Crass, ExtractData). This is also true for developers, actually. I don't mind this when I'm working on a translation project, but when I just want to quickly extract the contents of an eroge, I'd rather drag-and-drop the archive on a window and be done with it.

rr- commented 9 years ago

40-50 minutes if I use Ruby, though. I do the critical stuff in C, so it's sort of acceptable. Regarding the purpose of the tool: frankly, most of archives I support so far can be extracted using other tools, so I guess it boils down to this:

Standards

This, and personally, I consider GUI to be a total bloat most of the time. Converting some files is definitely one of these cases. Majority of the tools out there does #include <windows.h>... why? arc_unpacker supports drag'n'drop even though it's CLI. It's only dependency is rmagick, which is probably going to go away after I switch languages.

Like I said in the ticket, I'll give it some more time, and when I feel up to the task, I'll try out Rust and Go. They should allow me to:


Dropping packing support seems reasonable, since (I think) every translation project needs its own hacker anyway. Giving the source code to him allows him to build his own tools and set up any environment he wants, and reversing unpacking from looking at the source code shouldn't be too difficult.

rr- commented 9 years ago

I finally completed the research, and here are my thoughts from the standpoint of this project:

  1. Scripting languages are slow and cannot be compiled to standalone .exe, thus making target audience even smaller than it already is.
  2. Go's main advantage lies in compile speeds and parallel processing. Go's toolchain, workspace management and requirement to set up $GOLANG are a deal-breaker to me.
  3. Rust has super weird syntax. I could get over it, but there's another huge disadvantage - compiling hello world results in 3.5 mb exe, which I find totally unacceptable. It might improve in the future, but we're talking about here and now.
  4. D, Ada and others: too exotic.
  5. C++. Bloatware.
  6. C. Yeah... C.

I checked out how developing in vanilla C feels like. After winning an epic fight with necessary evil that is makefile, I found the development in C to be... kind of calming.

I'll go with C. Results will be committed into c branch.

erengy commented 9 years ago

I agree with most of the points, but I'd argue that C++ brings more to the table with no practical cost. You can always pick and choose which features of C++ to use, and continue coding in C where you see fit. As a result, you can write less code and spend less time on dealing with pointers and stuff. I don't have a strong opinion on the matter though, and you should totally use C if you enjoy it more.

rr- commented 9 years ago

I learned you were right the hard way.

At first, everything went smooth: I had total control over the program, no stuff was happening under the hood, etc. Memory footprint and executable sizes were minimal.

Then I wanted my program not to SIGSEGV when things went bad (e.g. archive was corrupted). Since C doesn't have exception model, I need to check all the function's return values ALWAYS. Not only is this tiresome, it's also totally counterproductive because it makes refactoring more difficult. The only alternative is to use longjmp, or a nice wrapper for longjmp such as e4c that introduces try-catch-finally keywords to C. This, however, means I need to be extra careful about my mallocs and put them inside finally blocks, otherwise I'll leak memory on exceptions (and won't even know about this). So my code still needs to be very verbose, just in another way.

Now C++ suffered the same problems... until C++11 introduced smart pointers. This should allow me to write almost assert-free code, which sounds great.

rr- commented 9 years ago

Finally done.