[Backend] DL server 설계 및 구상도

대주제 : dl server 의 설계와 각각의 API 들의 설정이유들을 명시하고 싶다.

소주제 : multi threading 으로 서버를 정의한 이유 와 활용한 각 라이브러리들을 정리하고 싶다.

참고 이슈

User Server 설계 및 구상도

model version server 설계 및 구상도

API 설명

이 서버의 경우, 제공하고자 하는 api 는 아래 2가지 입니다.

서버를 시작할 때, 현재 사용중인 모델을 api_model_version GetUsingModelVersion 을 통해서, 현재 사용중인 모델의 정보를 미리 들고 와서, 모델을 Load 합니다.

Inference

이미지를 PIL.Image.Image-> BytesIO -> bytes 으로 변환을 하여 이미지를 받고,

`bytes -> BytesIO -> PIL.Image.Imag 으로, 저장을 하고,

Exif Data 를 이미지에서 추출을 하여, 미리 이미지를 회전을 합니다.

모델을 통해서, 결과를 추론하여, 결과를 돌려 보내줍니다.
LoadModel

path 를 입력 받아서, 새로운 모델의 weight 를 load 합니다.
Exif Data 참고자료

https://sga8.tistory.com/7

https://stackoverflow.com/questions/13872331/rotating-an-image-with-orientation-specified-in-exif-using-python-without-pil-in

AuthenticateUserAccessToken 미들 웨어

유저 확인 (access_token)

api_user 서버에 있는 Authenticate 에 요청을 보내서, access_token 을 기반으로,

middlewate(intercepter) 로서, AuthenticateUserAccessToken(grpc.aio.ServerInterceptor) 을 통해서,

미리 유저를 인증하는 형태로 구현을 해두었습니다.

Multi-Threading 을 선택한 이유.

우리가 흔히, python 에서는 GIL 때문에, multi-threading 이 multi-processing 보다 느리다고 생각을 많이 합니다.

하지만, torch, numpy 라이브러리의 경우 Cor C++ 단으로 내려가 연산을 진행을 하기 때문에,

흔히들 생각하는 것만큼 Multi-threading 이 Multi-Processing 보다 못하지 않고, 역으로 좋은 성과를 내는 것을 확인을 할 수 있었습니다.

예전에 인턴을 통한 경험에서, 한번 부딪혀 보았던 문제이고,

요기에서는 구현 하지는 않았지만, 예전에는 ABtool(Apache Benchmark tool) 을 통해, 1만개의 Request 를 경우들 마다 보내보면서,

그 결과를 수치로 정리를 하면서, 확인을 할 수 있었습니다.

model 한 번 호출을 하고, Heap 에 올린 상태에서, global model 을 통해서, 각각의 request 를 처리하는 것이, 용량적으로도 더 효율적으로 사용할 수 있을 것이므로, 충분히 좋은 결과를 낼 수 있을 것입니다.

만약, multi processing 을 해야한다면, 각각의 프로세스 마다, 모델을 정의해서 모델 파라미터를 로드 하지말고,

model 은 한번만 로드하여, 각 프로세스 공통으로 사용할 수 있도록 설계해야합니다.

참고 자료:

https://discuss.pytorch.org/t/can-pytorch-by-pass-python-gil/55498

https://www.youtube.com/watch?v=m2yeB94CxVQ&list=LL&index=2&t=48s

필요한 부분(model weight)만 서버에 올려야한다.

또한, .tar 형태로 모델을 정리할 경우, 간편할 수도 있지만, 이렇게 할 경우, 서버에 불필요한 메모리를 올릴 수도 있으므로,

extract_model_weight.py 를 통해서, model_weight parameter 를 추출을 해내어서, .pth, 필요한 부분의 파일(model weight) 만 서버에 로드합니다.

만약, 실제로 이 모델을 배포를 하게 된다면

AWS 의 EC2 의 t2.small(vCPUs:1, 메모리: 2GiB) instance 를 여러 개 만들어서, 개수를 증설을 하는 것이.

서버 한개의 스펙을 높이는 것보다 더 좋은 성능을 낼 수 있을 것입니다.

아마 비용적으로도, 더 좋은 결과를 낼 수 있을 것입니다.

Pytorch 에서 모델을 서버로 올릴 때, 유의해야하는 점

model.eval()

model.eval() will notify all your layers that you are in eval mode, that way, batchnorm or dropout layers will work in eval mode instead of training mode.

eval 모드에서는 dropout은 비활성화, 배치 정규화는 학습에서 저장된 파라미터를 사용
```
model = load_model_weight_from_pth(model, model_file_name=model_file_name)
model.eval()
```
with torch.no_grad()

impacts the autograd engine and deactivate it. It will reduce memory usage and speed up computations but you won’t be able to backprop (which you don’t want in an eval script).

오차 역전파에 사용하는 계산량을 줄여서 처리 속도를 높인다.
```
with torch.no_grad():
  output = model(tensor_image)  # [1, 4], inference_time : 0.20484089851379395
```

https://discuss.pytorch.org/t/model-eval-vs-with-torch-no-grad/19615/5

with torch.no_grad() 자세히 이해하기

network train and eval

스크린샷 2021-02-12 오후 7 30 38

GRPC 에서, 이미지 Request and Response 정리해보기

제가 생각하기에는 이미지를 전송하는 방법은 총 2가지가 있습니다.

Bytes 형태로 전송하는 방법
string 으로 base64 Encoding 해서 전송하는 방법

물론, 1번이 웬만하면 시간과 메모리 측면에서 더 효율적이나, 2번도 가끔식 쓰이기 때문에,

2가지 방법을 다 정리하는게 도움이 될거 같아서, 정리하려고 합니다.

Bytes 형태로 전송하는 방법

아래와 같이, pil image 를 BytesIO 에 담아서, 이를 pil -> bytes 로 변환하여, 전송하는 방법입니다.

client.py

image: PIL.Image.Image = Image.open("./sample/sample1.jpg")
image_file: BytesIO = BytesIO()
image.save(image_file, format="PNG")
image_bytes: bytes = image_file.getvalue()

# -----------------------------------------------------
# 아래 방식으로도 쓸 수 있습니다. 
image_bytes: bytes = open("./sample/sample1.jpg", "rb").read()

with open("./sample/sample1.jpg", "rb") as imageFile:
  image_bytes: bytes = imageFile.read()

server.py

client 에서 온 정보를 그대로, 다시, Image 로 만들어서 활용하시면 됩니다.

def convert_bytes_image2pil_image(bytes_image_content: bytes) -> PIL.Image.Image:
    image_file: BytesIO = BytesIO(bytes_image_content)
    image: PIL.Image.Image = Image.open(image_file)
    return image

def save_convert_bytes_image2pil_image(bytes_image_content: bytes, save_file_name="check.jpg") -> None:
    image_file: BytesIO = BytesIO(bytes_image_content)
    with open(save_file_name, "wb") as f:
        f.write(image_file.getbuffer())

string 으로 base64 Encoding 해서 전송하는 방법

아래 처럼 base64 encoding decoding 을 활용해서, bytes -> string 으로 변환시켜서 구현할 수 있다.

하지만, 이를 변환하는 시간과 추가되는 용량(약 4/3 배 증가한다고 한다.) 을 고려한다면, 1번처럼 바로 보내는 방법이 더 cost 가 낮은 방법이다.

단지 Base64의 용도는 바이너리 파일들을 아스키코드로 표현할 수 있는 문자로 변환하는 것이 주 목적

client.py

image: PIL.Image.Image = Image.new('RGB', (224, 224), (127, 127, 127))
image_file: BytesIO = BytesIO()
image.save(image_file, format="PNG")
image_bytes: bytes = image_file.getvalue() 
b64image: str = base64.b64encode(image_bytes)

# -----------------------------------------------------
# 아래 방식으로도 쓸 수 있습니다. 
image_bytes: bytes = open("./sample/sample1.jpg", "rb").read()

with open("./sample/sample1.jpg", "rb") as imageFile:
  image_bytes: bytes = imageFile.read()

server.py

def convert_b64image2pil_image(b64image: str) -> PIL.Image.Image:
    image_bytes: bytes = base64.b64decode(b64image)
    image_file: BytesIO = BytesIO(image_bytes)

    image: PIL.Image.Image = Image.open(image_file)
    return image

def save_convert_b64image2pil_image(b64image: str, save_file_name="check.jpg") -> None:
    image_bytes: bytes = base64.b64decode(b64image)
    image_file: BytesIO = BytesIO(image_bytes)

    with open(save_file_name, "wb") as f:
        f.write(image_file.getbuffer())

참고자료

https://hyoje420.tistory.com/1

https://ghdwn0217.tistory.com/76

heojae / FoodImageRotationAdmin